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Keeping  It  Real 


I  got  my  first  introduction  to  real  project  planning  10  years  ago  when  Capability 
Maturity  Model®  Integration  founding  father  Watts  Humphrey  came  to  HiU  Air 
Force  Base  to  pilot  the  Team  Software  Process^^  (TSP^^  with  our  TaskView  project. 
As  a  project  manager  with  more  than  11  years  of  software  experience,  including  three 
years  in  our  software  engineering  process  group  and  as  a  certified  Personal  Software 
Process^^  instructor,  I  was  confident  the  plan  I  had  led  the  TaskView  team  to  construct 
was  flawless.  Before  Watts’  visit,  we  had  spent  several  days  defining  aU  of  the  product 
components  and  used  a  modified  Delphi  approach  to  estimate  the  duration  of  each  one.  We  had 
determined  resources,  made  assignments,  built  a  detailed  Gantt  chart  with  four  dozen  tasks, 
identified  aU  the  dependencies,  planned  for  every  milestone,  and  determined  the  critical  path. 
We  knew  each  and  every  deliverable,  its  customer,  format,  and  need  date.  We  were  ready,  or  so 
I  thought. 

Over  the  next  week,  I  watched  as  Watts  worked  painstakingly  with  our  team  members  to  cre¬ 
ate  a  real  project  plan,  one  that  each  engineer  not  only  helped  to  create,  but  could  use  to  guide 
his  or  her  daily  activities.  Our  meager  four-dozen  task  Gantt  chart  was  replaced  by  a  more  than 
400-task  Earned  Value  Plan,  estimated  both  by  a  top-down  and  bottom-up  approach;  our  risks 
were  identified,  recorded,  categorized,  prioritized,  and  assigned  for  follow-up,  and  a  never- 
before-conceived-of  quality  plan  was  generated. 

All  of  this  was  for  a  six-month  project  of  six  people  ...  and  the  results  of  this  launch  were 
staggering. 

This  plan  was  the  basis  of  each  weekly  review.  We  were  able  to  tell  immediately  when  tasks 
were  falling  behind  schedule.  TaskView  avoided  or  mitigated  all  of  its  critical  risks.  The  quality 
of  our  product  surpassed  anything  we  had  ever  produced. 

While  I  had  known  for  several  years  that  project  planning  and  tracking  were  critical,  it  was 
not  until  this  experience  that  I  realized  how  useful  these  plans  could  be.  Not  only  did  they  guide 
our  actions,  but  they  provided  a  basis  for  stability  when  requirements  inevitably  changed.  In  one 
case,  for  example,  I  was  able  to  use  the  planning  data  to  determine  quantitatively  that  I  could 
loan  one  of  our  engineers  to  another  team  without  risking  the  TaskView  delivery.  Since  that 
time,  I  have  been  a  staunch  advocate  not  only  of  the  TSP  but  of  taking  the  time  to  do  real  pro¬ 
ject  planning.  These  solid,  effective  plans  are  worth  all  of  the  effort  to  create  and  maintain  them. 

This  month’s  CrossTalk  is  filled  with  wonderful  guidance  for  the  critical  tasks  of  plan¬ 
ning  and  tracking  software  projects.  First,  in  his  article  Software  Tracking:  The  Tast  Defense  Against 
Tailure^  software  veteran  Capers  Jones  details  four  worst practices\Q.2idA\^  to  catastrophic  failure  and 
even  litigation  on  software  projects.  With  his  usual  prowess.  Capers  succinctly  uncovers  the 
mines  in  the  minefield  so  that  the  rest  of  us  can  avoid  them! 

My  two  favorite  articles  in  this  month’s  issue  are  Does  Project  T erf ormance  Stability  Exist?  A  Re- 
examination  of  CPI  and  Evaluation  of  SPI(t)  Stability  by  Kym  Henderson  and  Dr.  Ofer  Zwikael 
(spoiler  alert:  It  does  eventually),  and  Walt  Lipke’s  Schedule  Adherence:  A  Useful  Measure  for  Project 
Management.  Both  articles  focus  on  Lipke’s  new  Earned  Schedule  measure:  an  exciting,  innova¬ 
tive,  and  effective  new  way  to  make  better  use  of  Earned  Value  planning  and  tracking  data. 

As  we  aU  know,  the  most  unpredictable  portion  of  any  software  development  or  mainte¬ 
nance  cycle  is  software  testing.  Dr.  David  J.  Coe  explains  how  to  make  testing  more  efficient  and 
effective  in  A  Review  of  Boundary  Value  Analysis  Techniques. 

Real  project  planning  and  tracking  begins  with  making  good  estimates,  and  in  the  capstone 
article.  Truth  and  Confidence:  Some  of  the  Realities  of  Software  Project  Estimation^  Phillip  G.  Armour 
details  the  many  issues  that  make  software  project  estimating  unusually  difficult  and  suggests  a 
fascinating  new  view  of  the  process  (and  outcome)  that  makes  estimates  more  usable. 

To  keep  the  attention  of  the  techies^  there  is  also  an  article  from  David  Premeaux,  discussing 
VoTP  Softphones. 

This  month’s  articles  will  help  guide  us  in  making  and  following  highly  effective  and  real  pro¬ 
ject  plans. 
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Software  Tracking: 

The  Last  Defense  Against  Failure 


Capers  Jones 
Software  Productivity  Kesearch,  ISLC 

Prom  working  as  an  expert  witness  in  a  number  of  lawsuits  where  large  software  projects  were  cancelled  or  did  not  operate 
correctly  when  deployed,  I found  that four  major  problems  occur  repeatedly:  1 )  accurate  estimates  are  not  produced  or  are  over¬ 
ruled;  2)  requirements  changes  are  not  handled  effectively;  3)  quality  control  is  deficient;  and  4)  progress  trackingfails  to  alert 
higher  management  to  the  seriousness  of  the  issues.  There  are  often  other  problems  as  well,  but  these  four  always  occur  in 
breach  of  contract  litigation. 


This  article  is  based  on  software  pro¬ 
jects  that  were  in  litigation  for  breach 
of  contract.  It  concentrates  on  four  worst 
practices  or  the  factors  that  most  often 
lead  to  failure  and  litigation.  A  previous 
article  dealt  with  additional  problems 
noted  during  litigation  [1]. 

For  the  purposes  of  this  article,  soft¬ 
ware  failures  are  defined  as  software  pro¬ 
jects  which  met  any  of  the  following 
attributes: 

1.  Termination  of  project  due  to  cost  or 
schedule  overruns. 

2.  Schedule  or  cost  overruns  in  excess  of 
50  percent  of  initial  estimates. 

3.  Applications  which,  upon  deployment, 
fail  to  operate  safely. 

4.  Lawsuits  brought  by  clients  for  contrac¬ 
tual  non-compliance. 

Although  there  are  many  factors  associ¬ 
ated  with  schedule  delays  and  project  can¬ 
cellations,  the  failures  that  end  up  in  court 
always  seem  to  have  four  major  deficiencies: 

1 .  Accurate  estimates  were  either  not  pre¬ 
pared  or  were  rejected. 

2.  Change  control  was  not  handled  effec¬ 
tively. 

3.  Quality  control  was  inadequate. 

4.  Progress  tracking  did  not  reveal  the  true 
status  of  the  project. 

Let  us  consider  each  of  these  topics  in 
turn. 

Estimating  Problems 

Although  cost  estimation  is  difficult,  there 
are  a  number  of  commercial  software  cost 
estimating  tools  that  do  a  capable  job: 
COCOMO  II,  KnowledgePlan,  Price-S, 
SEER,  SLIM,  and  SoftCost  are  examples. 

However,  just  because  an  accurate  esti¬ 
mate  can  be  produced  using  a  commercial 
estimating  tool,  this  does  not  mean  that 
clients  or  executives  will  accept  it.  In  fact, 
from  information  presented  during  litiga¬ 
tion,  about  half  of  the  cases  did  not  pro¬ 
duce  accurate  estimates  at  all.  The  other  half 
had  accurate  estimates  but  they  were  reject¬ 
ed  and  replaced  by  forced  estimates  based 


on  business  needs  rather  than  team  abilities. 

The  main  reason  that  accurate  estimates 
were  rejected  and  replaced  was  the  absence 
of  supporting  historical  data.  Without  this, 
even  accurate  estimates  may  not  be  con¬ 
vincing.  A  lack  of  soHd  historical  data 
makes  project  managers,  executives,  and 
clients  bHnd  to  the  realities  of  software 
development. 

A  situation  such  as  this  was  one  of  the 
contributing  factors  to  the  long  delay  in 
opening  the  Denver  International  Airport. 
Estimates  for  the  length  of  time  to  com¬ 
plete  and  debug  the  very  complex  baggage 
handling  software  were  not  believed  [2]. 

For  more  than  60  years  the  software 
industry  lacked  a  soHd  empirical  foundation 
of  measured  results  that  was  available  to  the 
pubHc.  Thus,  almost  every  major  software 
project  is  subject  to  arbitrary  and  some¬ 
times  irrational  schedule  and  cost  con¬ 
straints.  However,  the  International 
Software  Benchmarking  Standards  Group 
(ISBSG),  a  non-profit  organization,  has 
started  to  improve  this  situation  by  offering 
schedule,  effort,  and  cost  benchmark 
reports  to  the  general  pubHck  Currently, 
more  than  4,000  projects  are  available,  and 
new  projects  are  added  at  a  rate  of  perhaps 
500  per  year. 

There  are  other  collections  of  software 
benchmark  data,  such  as  those  gathered  by 
the  Gartner  Group,  David’s  Consulting 
Group,  Software  Productivity  Research, 
and  other  companies,  as  weU.  However,  this 
data  is  usuaUy  made  available  only  on  a  sub¬ 
scription  basis  to  specific  cHents  of  the 
organizations.  The  ISBSG  data,  by  contrast, 
is  available  to  the  general  pubHc. 

Changing  Requirements 

The  average  rate  at  which  software 
requirements  change  is  about  1  percent 
per  calendar  month.  Thus,  for  a  project 
with  a  12  month  schedule,  more  than  10 
percent  of  the  final  deHvery  wiH  not  have 
been  defined  during  the  requirements 
phase.  For  a  36-month  project,  almost  a 


third  of  the  features  and  functions  may 
have  come  in  as  an  afterthought. 

These  are  only  average  results.  The 
author  has  observed  a  three-year  project 
where  the  delivered  product  exceeded  the 
functions  in  the  initial  requirements  by 
about  289  percent.  It  is  of  some  impor¬ 
tance  to  the  software  industry  that  the  rate 
at  which  requirements  creep  or  grow  can 
now  be  measured  directly  by  means  of  the 
function  point  metric.  This  explains  why 
function  point  metrics  are  now  starting  to 
become  the  basis  of  software  contracts  and 
outsource  agreements. 

Unfortunately,  in  projects  where  Htiga- 
tion  occurred,  requirements  changes  were 
numerous  but  their  effects  were  not  prop¬ 
erly  integrated  into  cost,  schedule,  and  qual¬ 
ity  estimates.  As  a  result,  unplanned  sHp- 
pages  and  overruns  occurred. 

In  several  cases,  the  requirements 
changes  had  not  been  formally  included  in 
the  contracts  for  development,  and  the 
cHents  refused  to  pay  for  changes  that  sub- 
stantiaHy  affected  the  scope  of  the  projects. 
One  case  involved  82  changes  that  totaled 
to  more  than  2,000  function  points  or 
about  20  percent  of  the  original  size  of  the 
initial  requirements. 

Since  the  defect  potentials  for  chang¬ 
ing  requirements  are  larger  than  for  the 
original  requirements  by  about  10  percent, 
and  since  defect  removal  efficiency  for 
changing  requirements  is  lower  by  about  5 
percent,  projects  with  large  volumes  of 
changing  requirements  also  have  severe 
quaHty  problems,  which  are  usuaUy  invisi¬ 
ble  until  testing  begins.  When  testing 
begins,  the  project  is  in  serious  trouble 
because  it  is  too  late  to  bring  the  schedule 
and  cost  overruns  under  control. 

Requirements  changes  wiU  always  occur 
for  large  systems.  It  is  not  possible  to  freeze 
the  requirements  of  any  real-world  appHca- 
tion  and  it  is  naive  to  think  it  is  possible. 
Therefore,  leading  companies  are  ready  and 
able  to  deal  with  changes  and  do  not  let 
them  become  impediments  to  progress.  For 
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projects  developed  under  contract,  the  con¬ 
tract  itself  must  include  unambiguous  lan¬ 
guage  for  dealing  with  changes. 

Quality  Problems 

Effective  software  quality  control  is  the 
most  important  single  factor  that  separates 
successful  projects  from  delays  and  disas¬ 
ters.  The  reason  for  this  is  because  finding 
and  fixing  bugs  is  the  most  expensive  cost 
element  for  large  systems,  and  it  takes  more 
time  than  any  other  activity. 

Successful  quality  control  involves 
defect  prevention,  defect  removal,  and 
defect  measurement  activities.  The  phrase 
defect  prevention  includes  all  activities  that 
minimize  the  probability  of  creating  an 
error  or  defect  in  the  first  place.  Examples 
of  defect  prevention  activities  include  the 
Six  Sigma  approach,  joint  application 
design  for  gathering  requirements,  usage  of 
formal  design  methods,  usage  of  structured 
coding  techniques,  and  usage  of  libraries  of 
proven  reusable  material. 

The  phrase  defect  removal  includes  all 
activities  that  can  find  errors  or  defects  in 
any  kind  of  deliverable.  Examples  of  defect 
removal  activities  include  requirements 
inspections,  design  inspections,  document 
inspections,  code  inspections,  and  all  kinds 
of  testing. 

Some  activities  benefit  both  defect  pre¬ 
vention  and  defect  removal  simultaneously. 
For  example,  participation  in  design  and 
code  inspections  is  very  effective  in  terms 
of  defect  removal,  and  also  benefits  defect 
prevention.  Defect  prevention  is  aided 
because  inspection  participants  learn  to 
avoid  the  kinds  of  errors  that  inspections 
detect. 

As  stated  earlier,  a  combination  of 
defect  prevention  and  defect  removal  activ¬ 
ities  leads  to  some  very  significant  differ¬ 
ences  in  the  overall  numbers  of  software 
defects,  compared  between  successful  and 
unsuccessful  projects  [1].  However,  addi¬ 
tional  data  now  shows  that  for  projects  in 
the  10,000  function  point  range  the  suc¬ 
cessful  ones  accumulate  development  totals 
of  around  4.0  defects  per  function  point 
and  remove  about  95  percent  of  them 
before  delivery  to  customers.  In  other 
words,  the  number  of  delivered  defects  is 
about  0.2  defects  per  function  point  or 
2,000  total  latent  defects.  Of  these,  about 
10  percent  or  200  would  be  fairly  serious 
defects.  The  rest  would  be  minor  or  cos¬ 
metic  defects. 

By  contrast,  the  unsuccessful  projects 
accumulate  development  totals  of  around 
7.0  defects  per  function  point  and  remove 
only  about  80  percent  of  them  before  deliv¬ 
ery.  The  number  of  delivered  defects  is 
about  1.4  defects  per  function  point  or 


14,000  total  latent  defects.  Of  these,  20  per¬ 
cent  (or  2,800)  would  be  fairly  serious 
defects.  This  large  number  of  latent  defects 
after  delivery  is  very  troubling  for  users. 
The  large  number  of  delivered  defects  is 
also  a  frequent  cause  of  litigation. 

Unsuccessful  projects  typically  omit 
design  and  code  inspections  and  depend 
purely  on  testing.  The  omission  of  up-front 
inspections  causes  three  serious  problems: 
1)  The  large  number  of  defects  still  present 
when  testing  begins  slows  the  project  to  a 
standstill;  2)  The  had  fix  injection  rate  for 
projects  without  inspections  is  alarmingly 
high;  and  3)  The  overall  defect  removal  effi¬ 
ciency  associated  with  only  testing  is  not 
sufficient  to  achieve  defect  removal  rates 
higher  than  about  80  percent. 

Software  Milestone  Tracking 

Those  readers  who  work  for  the 
Department  of  Defense  or  for  a  defense 
contractor  will  note  that  the  earned  value 
approach  is  only  cited  in  passing.  There  are 
several  reasons  for  this.  First,  none  of  the 
lawsuits  where  the  author  was  an  expert 
witness  involved  defense  projects  so  the 
earned-value  method  was  not  utilized. 
Second,  although  the  earned-value  method 
is  common  in  the  defense  community,  its 
usage  among  civilian  projects  including  out¬ 
sourced  projects  is  very  rare.  Third,  empiri¬ 
cal  data  on  the  effectiveness  of  the  earned- 
value  approach  is  sparse.  A  number  of 
defense  projects  that  used  earned-value 
methods  have  run  late  and  been  over  bud¬ 
get.  There  are  features  of  the  earned-value 
method  that  would  seem  to  improve  both 
project  estimating  and  project  tracking,  but 
empirical  results  are  sparse. 

Once  a  software  project  is  under  way, 
there  are  no  fixed  and  reliable  guidelines  for 
judging  its  rate  of  progress.  The  civilian 
software  industry  has  long  utilized  ad  hoc 
milestones  such  as  completion  of  design  or 
completion  of  coding.  However,  these 
milestones  are  notoriously  unreliable. 

Tracking  software  projects  requires 
dealing  with  two  separate  issues:  1)  achiev¬ 
ing  specific  and  tangible  milestones,  and  2) 
expending  resources  and  funds  within  spe¬ 
cific  budgeted  amounts. 

Because  software  milestones  and  costs 
are  affected  by  requirements  changes  and 
scope  creep,  it  is  important  to  measure  the 
increase  in  size  of  requirements  changes, 
when  they  affect  function  point  totals. 
However,  there  are  also  requirements 
changes  that  do  not  affect  function  point 
totals  which  are  termed  requirements  churn. 
Both  creep  and  churn  occur  at  random 
intervals.  Churn  is  harder  to  measure  than 
creep  and  is  often  measured  via  hackfiring  or 
mathematical  conversion  between  source 


code  statements  and  function  point  metrics. 

For  an  industry  now  more  than  50  years 
old,  it  is  somewhat  surprising  that  there  is 
not  a  general  or  universal  set  of  project 
milestones  for  indicating  tangible  progress. 
From  the  author’s  assessment  and  baseline 
studies.  Table  1  (see  next  page)  shows  some 
representative  milestones  that  have  shown 
practical  value. 

The  most  important  aspect  of  Table  1 
is  that  every  milestone  is  based  on  com¬ 
pleting  a  review,  inspection,  or  test.  Just 
finishing  up  a  document  or  writing  code 
should  not  be  considered  a  milestone 
unless  the  deliverables  have  been  reviewed, 
inspected,  or  tested. 

Suggested  Format  for  Monthly 
Status  Reports  for  Software 
Projects 

A  suggested  format  for  monthly  progress 
tracking  reports  delivered  to  clients  and 
higher  management  would  include  the  fol¬ 
lowing: 

1 .  Status  of  last  month’s  red flag  problems. 

2.  New  red flag  problems  noted  this  month. 

3.  Change  requests  processed  this  month 
versus  change  requests  predicted. 

4.  Change  requests  predicted  for  next 
month. 

5.  Size  in  function  points  for  this  month’s 
change  requests. 

6.  Size  in  function  points  predicted  for 
next  month’s  change  requests. 

7.  Schedule  impacts  of  this  month’s 
change  requests. 

8.  Cost  impacts  of  this  month’s  change 
requests. 

9.  Quality  impacts  of  this  month’s  change 
requests. 

10.  Defects  found  this  month  versus 
defects  predicted. 

1 1 .  Defects  predicted  for  next  month. 

12.  Costs  expended  this  month  versus  costs 
predicted. 

13.  Costs  predicted  for  next  month. 

14.  Deliverables  completed  this  month  ver¬ 
sus  deliverables  predicted. 

15.  Deliverables  predicted  for  next  month. 
An  interesting  question  is  the  frequency 

with  which  milestone  progress  should  be 
reported.  The  most  common  reporting  fre¬ 
quency  is  monthly,  although  exception 
reports  can  be  filed  at  any  time  it  is  sus¬ 
pected  that  something  has  occurred  that 
can  cause  perturbations.  For  example,  seri¬ 
ous  illness  of  key  project  personnel  or  res¬ 
ignation  of  key  personnel  might  very  well 
affect  project  milestone  completions  —  this 
kind  of  situation  cannot  be  anticipated. 

It  might  be  thought  that  monthly 
reports  are  too  far  apart  for  small  projects 
that  last  six  months  or  less  in  total.  For 
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1. 

Requirements  document  completed. 

2. 

Requirements  document  review  completed. 

3. 

Initial  cost  estimate  completed. 

4. 

Initial  cost  estimate  review  completed. 

5. 

Development  plan  completed. 

6. 

Development  plan  review  completed. 

7. 

Cost  tracking  system  initialized. 

8. 

Defect  tracking  system  initialized. 

9. 

Prototype  completed. 

10. 

Prototype  review  completed. 

11. 

Complexity  analysis  of  base  system  (for  enhancement  projects). 

12. 

Code  restructuring  of  base  system  (for  enhancement  projects). 

13. 

Functional  specification  completed. 

14. 

Functional  specification  review  completed. 

15. 

Data  specification  completed. 

16. 

Data  specification  review  completed. 

17. 

Logic  specification  completed. 

18. 

Logic  specification  review  completed. 

19. 

Quality  control  plan  completed. 

20. 

Quality  control  plan  review  completed. 

21. 

Change  control  plan  completed. 

22. 

Change  control  plan  review  completed. 

23. 

User  information  plan  completed. 

24. 

User  Information  plan  review  completed. 

25. 

Code  for  specific  modules  completed. 

26. 

Code  inspection  for  specific  modules  completed. 

27. 

Code  for  specific  modules  unit  tested. 

28. 

Test  plan  completed. 

29. 

Test  plan  review  completed. 

30. 

Test  cases  for  specific  test  stage  completed. 

31. 

Test  case  Inspection  for  specific  test  stage  completed. 

32. 

Test  stage  completed. 

33. 

Test  stage  review  completed. 

34. 

Integration  for  specific  build  completed. 

35. 

Integration  review  for  specific  build  completed. 

36. 

User  Information  completed. 

37. 

User  information  review  completed. 

38. 

Quality  assurance  sign  off  completed. 

39. 

Delivery  to  beta  test  clients  completed. 

40. 

Delivery  to  clients  completed. 

Table  1 :  Representative  Tracking  Milestones  for  Targe  Software  Projects 

small  projects,  weekly  reports  might  be  pre¬ 
ferred.  However,  small  projects  usually  do 
not  get  into  serious  trouble  with  cost  and 
schedule  overruns,  whereas  large  projects 
almost  always  get  in  trouble  with  cost  and 
schedule  overruns.  This  article  concentrates 
on  the  issues  associated  with  large  projects. 
In  the  litigation  where  the  author  has  been 
an  expert  witness,  every  project  in  litigation 
except  one  was  larger  than  10,000  function 
points  in  size. 

Failing  or  delayed  projects  usually  lack 
serious  milestone  tracking.  Activities  are 
often  reported  as  finished  while  work  was 
still  ongoing.  Milestones  on  failing  pro¬ 
jects  are  usually  dates  on  a  calendar  rather 
than  completion  and  review  of  actual 
deliverables. 

Delivering  documents  or  code  segments 


that  are  incomplete,  contain  errors,  and  can¬ 
not  support  downstream  development 
work  is  not  the  way  milestones  are  used  by 
industry  leaders. 

Because  milestone  tracking  occurs 
throughout  software  development,  it  is  the 
last  line  of  defense  against  project  failures 
and  delays.  Milestones  should  be  established 
formally  and  should  be  based  on  reviews, 
inspections,  and  tests  of  deliverables. 
Milestones  should  not  be  the  dates  that 
deliverables  more  or  less  were  finished;  they 
should  reflect  the  dates  that  finished  deliv¬ 
erables  were  validated  by  means  of  inspec¬ 
tions,  testing,  and  quality  assurance  review. 

Summary  and  Results 

Overcoming  the  risks  shown  here  is  largely 
a  matter  of  opposites,  or  doing  the  reverse 


of  what  the  risk  indicates.  Thus  a  well- 
formed  software  project  will  create  accurate 
estimates  derived  from  empirical  data  and 
supported  by  automated  tools  for  handling 
the  critical  path  issues.  Such  estimates  will 
be  based  on  the  actual  capabilities  of  the 
development  team  and  will  not  be  arbitrary 
creations  derived  without  any  rigor.  The 
plans  will  specifically  address  the  critical 
issues  of  change  requests  and  quality  con¬ 
trol.  In  addition,  monthly  progress  reports 
will  also  deal  with  these  critical  issues. 
Accurate  progress  reports  are  the  last  line 
of  defense  against  failures.^ 
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Does  Project  Performance  Stability  Exist? 

A  Re-examination  of  CPI  and  Evaluation  of  SPI(t)  Stability 


Kym  Henderson  Dr.  Ofer  Zwikael 

PMI  College  of  Performance  Management  Victoria  University  of  Wellington 

The  development  of  the  Earned  Schedule  (ES)  method  hj  Walt  Eipke  in  2003  has  been  shown  to  be  an  important  exten¬ 
sion  to  the  Earned  Value  Management  (EVM)  method,  increasing  the  utility  of  EVM  data  for  project  schedule  analysis, 
control,  and  oversight.  Ms  ES  provides  a  reliable  time-based  indicator  of  schedule  performance,  the  objective  of  this  article  is 
to  investigate  whether  the  Schedule  Performance  Index  (time)  (SPIftJ)  exhibited  similar  stability  characteristics  to  those  exten¬ 
sively  reported for  the  Cost  Performance  Index  (CPIf  in  EVM.  This  article  analyr^s  EVM  data  from  three  different  coun¬ 
tries  for  projects  in  three  industry  segments.  There  were  37  projects  examined  for  SPl(t)  stability  and  26  for  CPI  stability. 

It  has  been  found  that  while  the  behavior  of  SPI(t)  is  broadly  consistent  with  CPI,  the  widely  reported  CPI  stability  rule 
cannot  be  generalrfed  even  within  the  U.S.  Department  of  Defense  (DoD)  project  portfolio.  Further  research  is  required  to 
develop  improved  understanding  of  project  performance  characteristics  and  the  behavior  of  CPI  and  the  SPI(t). 


The  cancellation  of  the  U.S.  Navy’s  A- 
12  Avenger  II  stealth  aircraft  program 
in  January  1991  resulted  in  research  during 
the  1990s,  which  investigated  the  reliabili¬ 
ty  of  EVM  cost  prediction  and  the  behav¬ 
ior  of  the  CPI  using  DoD  project  data  [1, 
2].  These  research  findings  have  come  to 
be  regarded  as  generally  applicable  across 
all  project  types  using  EVM  across  multi¬ 
ple  industry  sectors.  A  finding  regarded  as 
particularly  significant  was  that  CPI  stabi¬ 
lizes  by  20  percent  of  project  completion. 

Lipke  proposed  the  ES  method  in 
2003  to  provide  time-based  measures  of 
schedule  performance  utilizing  EVM  data. 
Initial  validation  has  shown  that  the  time- 
based  ES-derived  SPI(t)  to  be  reliable  for 
both  early  and  late  finish  projects.  For  a 
technical  description  of  the  ES  method, 
the  reader  is  referred  to  [3].  For  an  excel¬ 
lent,  easy-to-read,  non-technical  but  com¬ 
prehensive  discussion  of  the  ES  method, 
refer  to  [4] . 

Following  the  initial  validation  of  ES, 
interest  developed  in  ascertaining  whether 
SPI(t)  exhibited  similar  stability  character¬ 
istics  to  those  extensively  reported  for 
CPI.  The  objective  of  this  article  is  to  re¬ 
examine  CPI  stability  and  to  compare  the 
stability  behavior  of  the  SPI(t)  with  CPI. 

This  article  found  that  while  the 
behavior  of  the  SPI(t)  is  broadly  consis¬ 
tent  with  CPI,  the  widely  reported  CPI 
stability  rule  cannot  be  generalized  to  all 
projects  utilizing  the  EVM  method  or 
even  within  the  DoD  project  portfolio. 
However,  the  consistent  behavior  to  CPI 
demonstrated  by  SPI(t)  provides  further 
support  for  the  validity  of  the  SPI(t)  met¬ 
ric  and  the  ES  method. 

Additional  analysis  was  unable  to 
establish  a  correlation  between  achieving 
earlier  CPI  and  SPI(t)  stability  and 
improved  outcomes  at  completion.  In  cer¬ 
tain  cases,  where  projects  achieved  either 


under  budget  and/or  early  finish  out¬ 
comes  with  cost  and/ or  schedule  stability 
achieved  late,  earlier  cost  and/or  schedule 
stability  would  have  been  disadvantageous 
to  the  actual  final  outcome  (s)  achieved. 
This  is  because  CPI  and/or  SPI(t)  pro¬ 
gressively  improved  over  the  life  of  those 
projects. 

This  article  also  demonstrates  that  by 
utilizing  ES,  research  of  schedule  perfor¬ 
mance  using  EVM  data  is  now  possible 
and  leads  to  improved  understanding  of 
the  dynamics  of  project  schedule  and  pro¬ 
ject  cost  performance. 

Background 

The  CPI  has  long  been  a  key  indicator 
used  to  analyze  the  cost  performance  of 
projects  using  EVM.  The  first  empiric 
confirmation  of  the  widely  reported  and 
referenced  CPI  stability  rule  was  by 
Christensen  and  Payne,  using  data  from 
26  U.S.  Air  Force-completed  contracts  in 
1992.  The  data  used  came  from  the  cost 
library  of  the  U.S.  Air  Force  Systems 
Command  Aeronautical  Systems 
Division  [5]. 

Christensen  and  Templin  conveniently 
summarized  the  series  of  research  find¬ 
ings  subsequent  to  [5]  in  2002: 

. . .  the  range  of  the  cumulative  CPI 
from  the  20  percent  completion 
point  to  contract  completion  was 
less  than  0.20  for  every  contract. 
This  result  is  usually  interpreted  to 
mean  that  the  cumulative  CPI  does 
not  change  by  more  than  plus  or 
minus  0.10  from  its  value  at  the  20 
percent  completion  point,  and  is 
used  to  evaluate  the  reasonableness 
of  projected  cost  efficiencies  on 
future  work  [6]. 

Christensen  and  Payne  made  the  following 


observations  on  the  perceived  importance 
of  CPI  stability: 

•  A  stable  CPI  is  evidence  that  the 
contractor’s  management  control 
systems,  particularly  the  planning, 
budgeting,  and  accounting  systems, 
are  functioning  properly. 

•  A  stable  CPI  may  thus  indicate  that 
the  contractor’s  estimated  final 
costs  of  the  authorized  work, 
termed  Estimated  at  Completion 
(EAC),  are  reliable. 

•  In  addition,  knowing  that  the  CPI 
is  stable  may  help  the  analyst  eval¬ 
uate  the  capability  of  a  contractor 
to  recover  from  a  cost  overrun  by 
comparing  the  CPI  with  other  key 
indicators,  such  as  the  To- 
Complete  Performance  Index  [5]. 

Over  time,  the  widely  reported  CPI 
stability  findings  have  been  generalized  as 
being  applicable  to  all  projects  utilizing  the 
EVM  method  [7-11].  An  extensive  litera¬ 
ture  review  has  not  found  further  empiric 
validation  of  the  CPI  stability  rule  beyond 
the  project  data  obtained  in  the  initial 
paper  and  data  from  the  DoD  Defense 
Acquisition  Executive  Summary  (DAES) 
database. 

Concurrent  research  into  the  stability 
characteristics  of  the  EVM  SPI  was  not 
possible  because  the  SPI  is  known  to  fail 
as  a  statistical  predictor  because  it  always 
returns  to  unity  at  project  completion, 
irrespective  of  duration-based  delay.  The 
SPI  is  also  recognized  as  failing  nominally 
within  the  final  third  of  the  project,  and  it 
also  fails  after  the  project’s  planned  dura¬ 
tion  has  been  exceeded. 

Lipke  proposed  the  ES  method  in 
2003  as  a  solution  to  these  limitations  and 
flaws  of  the  EVM  schedule  indicators  [3] . 
A  series  of  studies  provided  initial  valida- 
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CPI  Stability 

Test  Statistic  Test  Result 

SPI(t)  Stability 

Test  Statistic  Test  Result 

UK  Construction 

0.623 

Ho 

0.748 

Ho 

Australian  IT 

1.000 

Ho 

0.500 

Ho 

Israeli  Hi-Tech 

0.806 

Ho 

0.613 

Ho 

Composite 

0.916 

Ho 

0.629 

Ho 

Table  1 :  Hypothesis  Test  Results 


Stability  Achieved 

UK 

Construction 

Australian 

IT 

Israeli 

Hi-Tech 

Composite 

SPI(t)  cum.  f  <  20% 

3 

0 

1 

4 

>  20% 

17 

5 

11 

33 

CPI  cum.  <  20% 

2 

0 

1 

3 

>  20% 

8 

4 

11 

23 

Table  2:  Summary  of  Stahiliy  Achievement  Related  to  20  Percent  Completion 


Total  Projects  Within  Each  Stability  Percentile  Band 
(Three  Data  Samples  Aggregated) 


Figure  1:  Total  Projects  CPI  and  SPI(t)  Stability  Within  Each  10  Percentile  Band 


Project  Completion  Categories  by  CPI  Stability  Bands 
(Three  Data  Samples  Aggregated) 


Figure  2:  Project  Completion  Categories  by  CPI  Stability  Band 


tion  of  the  ES  method,  some  by  using  real 
EVM  project  data  and  also  by  using  simu¬ 
lated  work  schedules  [12-16].  The  time- 
based  ES  derived  SPI(t)  has  been  shown 
to  be  reliable  for  both  early  and  late  finish 
projects.  The  SPI(t)  only  reverts  to  unity  at 
project  completion  if  on-time  completion 
has  been  achieved. 

A  research  study  intended  to  validate 
the  ES  construct  using  DAES  data  was 
commissioned  in  2004  and  undertaken  by 
a  U.S.  Air  Force  Institute  of  Technology 
masters  student.  Unfortunately,  this  study 
was  discontinued  after  an  independent 
review  determined  the  following: 

Results:  The  historical  data  collec¬ 
tion  procedures  for  the  DoD  and 
U.S.  Air  Force  do  not  allow  for  suf¬ 
ficient  testing  of  ES  theory  at  this 
time.  A  statistical  evaluation  con¬ 
cluded  that  SPI(t)  is  different  than 
SPI($);  however,  the  two  variables 
are  highly  correlated.  The  result  of 
the  analysis  identified  that  SPI(t) 
performs  similarly  to  SPI($)  with 
the  data  contained  in  the  DAES 
database.  In  order  for  the  ES  theo¬ 
ry  to  be  fully  investigated,  addition¬ 
al  data  must  be  collected.  This 
research  shows  that  the  necessary 
data  may  also  not  be  available 
despite  the  best  collection  efforts. 
The  original  schedule  and  planned 
duration  information  is  critical  to 
successful  evaluation  of  the  ES 
methodology  [17]. 

However,  early  interest  by  the  Project 
Management  Institute  (PMI)  resulted  in 
the  principles  of  ES  being  included  as  an 
Emerging  Practice  Insert  in  the  Practice 
Standard  for  Earned  Value  Management 
published  in  2004  [1 8] . 

Following  the  initial  validation  of  ES, 
interest  developed  in  ascertaining  whether 
the  SPI(t)  exhibited  similar  stability  char¬ 
acteristics  to  those  extensively  reported 
for  the  CPI.  The  objective  of  this  article  is 
to  re-examine  CPI  stability  and  to  com¬ 
pare  the  stability  behavior  of  the  SPI(t) 
with  CPL 

Method  for  Evaluating 
Stability 

EVM  project  data  was  loaded  into  a 
Microsoft  Excel  Stability  Point  Calculator^ 
developed  by  Lipke.  The  calculator  deter¬ 
mines  the  observation  number  in  a 
sequence  of  CPI  and  SPI(t)  values  at 
which  all  subsequent  observations  are 
within  a  defined  stability  kmit.  The  stabili¬ 
ty  limit  used  is  0.10.  The  calculator  enables 
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the  associated  percentage  complete  at 
which  stability  occurs  to  be  determined. 

To  determine  the  significance  of  the 
observations  of  stability  for  both  CPI  and 
SPI(t),  statistical  hypothesis  testing  is  con¬ 
ducted.  The  test  applied  is  the  Sign  Test  at 
0.05  level  of  significance"^  [19].  The  Sign 
Test  was  used  in  this  research  because  it 
does  not  depend  upon  the  data  having  a 
normal  distribution.  In  past  research,  the 
hypothesis  test  method  chosen  implied 
that  the  data  was  normally  distributed; 
however,  the  normality  of  the  data  was 
not  established.  Research  by  Lipke  also 
suggests  the  following: 

Results  indicate  the  logarithm  data 
representations  of  the  indexes  are 
likely  normally  distributed,  whereas 
the  distributions  for  CPI,  SPI,  and 
CV  are  not  [20]. 

The  question  to  answer  regarding  sta¬ 
bility  is  can  it  be  stated  generally  and  reliably 
that  the  final  value  ofi  the  pefiormance  index  is 
within  0.10  ofi  its  value  when  the  project  is  20 
percent  complete?  answer  to  the  question 
will  be  yes  if  the  alternate  hypothesis  is 
satisfied: 

H1(CPI):  ICPI(final)  -  CPI(20%)|  <  0.10 

H2  (SPI(t)):  |SPI(t)(final)  -  SPI(t)(20%)| 
<0.10 

Two  separate  hypothesis  tests  are  con¬ 
ducted,  one  for  CPI  and  one  for  the 
SPI(t).  The  result  from  the  hypothesis 
testing  is  recorded  as  Ha  when  the  value 
of  the  test  statistic  is  in  the  critical  region 
(0.05)  and  Ho  (null  hypothesis)  when  it  is 
not. 

The  Data 

A  composite  EVM  data  set  was  assembled 
comprising  commercial  sector  data  sam¬ 
ples  obtained  from  following: 

•  Twenty- four  United  Kingdom  (UK) 
construction  projects. 

•  Twelve  Israeli  high-technology  (Hi¬ 
Tech)  projects. 

•  Nine  Australian  Information 
Technology  (IT)  projects. 

The  EVM  data  consists  of  direct  labor 
costs  only  with  the  following: 

•  UK  construction  projects  recorded  in 
person  days  weekly  with  EVM  values 
expressed  as  a  percentage  of  the  bud¬ 
get  at  complete  to  further  maintain 
data  anonymity. 

•  Israeli  Hi-Tech  projects  recorded  in 
US.  dollars  monthly. 

•  Australian  IT  projects  recorded  in 
Australian  dollars  weekly. 


An  extensive  review  of  the  data  was 
undertaken.  Projects  were  excluded  from 
the  sample  for  a  variety  of  reasons  includ¬ 
ing  the  following: 

•  Lack  of  data  integrity. 

•  Lack  of  EV  data  at  20  percent  of  pro¬ 
ject  completion. 

•  Partially  incomplete  Planned  Value 
data. 

•  Lack  of  required  Actual  Cost  (AC) 
data. 

Ten  UK  construction  projects  are 
included  in  the  CPI  stability  research  sam¬ 
ple.  Five  of  these  project  were  included 
although  the  final  AC  data  available  was 
between  96.7  percent  and  99.0  percent 
complete.  Including  those  five  projects  is 
consistent  with  the  approach  adopted  by 
Christensen  and  Payne’s  research  and 
assumes  that  the  difference  between 
CPUinai  and  the  latest  available  CPI  has  no 
material  impact  on  the  findings  [5]. 

The  outcome  was  a  usable  data  sample 
of  the  following: 

•  Twelve  Israeli  Hi-Tech  projects  for  the 
SPI(t)  and  CPI  stability  research. 

•  Twenty  UK  construction  projects  for 
the  SPI(t)  stability  and  10  for  CPI  sta¬ 
bility  research. 

•  Five  Australian  IT  projects  for  the 
SPI(t)  stability  and  four  for  CPI  stabil¬ 
ity  research. 

Stability  Evaluation  Results 

The  results  of  the  sign  tests  for  the  fol¬ 
lowing  hypothesis  are  shown  in  Table  1: 
Can  it  be  stated  generally  and  reliably  that 
the  final  value  of  the  performance  index  is 
within  0.10  of  its  value  when  the  project  is 
20  percent  complete?  Recall  that  the  test 
result  of  Ha  indicates  stability  of  the  per¬ 
formance  indicators  CPI  and  the  SPI(t). 
As  is  shown,  the  test  results  did  not  have 
any  test  statistic  in  the  critical  region  (0.05). 
As  a  result,  none  of  the  null  hypotheses 
can  be  rejected  for  any  of  the  three  sam¬ 
ples  or  the  composite  of  all  samples.  This 
means  that  stability  was  not  achieved  for 
either  CPI  or  the  SPI(t)  by  the  time  the 
project  was  20  percent  complete. 

This  research  does  not  support  the 
previously  referenced  generalizations  that 
the  CPI  stability  rule  has  universal  applica¬ 
bility  for  all  projects  utilizing  the  EVM 
method.  Because  the  SPI(t)  index  demon¬ 
strates  a  similar  lack  of  stability  to  that 
found  for  CPI,  the  validity  of  the  SPI(t) 
metric  is  supported  due  to  the  consistent 
behavior  demonstrated  with  CPI. 

Table  2  summarizes  the  raw  data  in 
relation  to  the  numbers  of  projects  that 
achieved  stability  before  or  after  20  per¬ 
cent  completion  for  the  SPI(t)  and  CPI  by 
each  project  set  and  for  the  composite  of 


all.  It  can  be  seen  that  the  majority  of  pro¬ 
jects  reach  stability  only  after  the  20  per¬ 
cent  completion  point. 

Figure  1  summarizes  each  10  percent 
complete  percentile  band  where  CPI  and 
the  SPI(t)  stability  occurred.  This  figure 
shows  the  following: 

•  The  wide  variability  in  the  achievement 
of  stability  for  both  CPI  and  the  SPI(t). 
Project  performance  heuristics  or  rules 
ofi  thumb  intended  to  be  generally 
applicable  (e.g.,  the  CPI  stability  rule) 
require  an  empirically  established  con¬ 
sistency  of  behavior  across  a  broad 
range  of  projects.  These  findings  are  a 
significant  impediment  to  proposing 
and  confirming  broadly  applicable  CPI 
and  SPI(t)  stability  heuristics. 

•  That  stability  is  usually  achieved  very 
late  in  the  project  life  cycle,  often  later 
than  80  percent  complete  for  projects 
in  these  samples. 

Zwikael  analyzed  the  Israeli  Hi-Tech 
project  sample  using  visual  inspection  of 
charts  and  suggested  that  CPI  stability  was, 
on  average,  achieved  at  the  60  percent 
completion  point  [21].  That  analysis 
broadly  confirms  this  article’s  finding  of 
CPI  stability  being  achieved  much  later  in 
the  project  life  cycle  than  previously 
reported. 

Additional  Analysis 

Following  the  lack  of  CPI  and  SPI(t)  sta¬ 
bility  findings  additional  analysis  was  con¬ 
ducted.  Within  each  10  percent  complete 
percentile  band  projects  were  categorized 
as  follows: 

•  Cost  at  completion: 

o  Under  or  On  Budget  (UOB). 
o  Over  Budget  (OvB). 

•  Schedule  at  completion: 

o  Early  or  On  Time  finish  (EOT), 
o  Late  Finish  (LF). 

The  purpose  of  this  analysis  is  to 
determine  if  there  is  a  correlation  between 
achieving  earlier  CPI  and  the  SPI(t)  stabil¬ 
ity  and  improved  project  outcomes. 

Figure  2  summarizes  the  analysis  for 
CPI  and  Figure  3  (see  next  page)  does  the 
same  for  the  SPI(t).  With  the  data  samples 
utilized,  achievement  of  earlier  stability  is 
not  correlated  with  improved  final  cost 
and/ or  schedule  outcomes. 

For  UOB  and  EOT  projects  where 
cost  and  schedule  stability  was  achieved 
late  (after,  say,  60  percent  completion) 
achieving  earlier  stability  would  have  been 
disadvantageous  to  the  final  outcome (s) 
achieved  because  project  performance 
progressively  improved  over  the  life  of 
those  projects. 

Figure  4  summarizes  projects  (with 
the  required  comparative  data),  which 

www.stsc.hill.af.mil  9 


April  2008 


Project  Tracking 


Project  Completion  Categories  by  SPI(t)  Stability  Bands 
(Three  Data  Samples  Aggregated) 


6 


5  - 

4  _ 

3  - 

■ 

■ 

2  _ 

■ 

r 

■ 

■ 

1  _ 

■ 

1 

1 

T 

1 

0 

1 

1 

I 

T 

1 

1 

1 

1 

I 

I 

EOT 

0-1 

LF 

10% 

EOT 

10- 

LF 

20% 

EOT 

20- 

LF 

30% 

EOT 

30- 

LF 

40% 

EOT 

40- 

■50% 

EOT 

50- 

LF 

60% 

EOT 

60- 

70% 

EOT 

70- 

80% 

EOT 

80- 

90% 

EOT 

90-1 

100% 

Israeli  Hi-Tech 

1 

1 

2 

2 

1 

3 

1 

1 

Australian  IT 

1 

1 

2 

1 

UK  Construction 

1 

2 

2 

1 

2 

2 

2 

1 

3 

2 

2 

■  Totals 

1 

2 

1 

3 

3 

1 

4 

2 

1 

1 

2 

4 

2 

4 

3 

3 

Figure  3:  Vroject  Completion  Categories  hjj  SPl(t)  Stability  Band 


achieved  SPI(t)  or  CPI  stability  first. 
Achieving  SPI(t)  stability  first  implies 
schedule  management  had  a  higher  man¬ 
agement  priority;  achieving  CPI  stability 
first  implies  cost  management  had  the 
higher  priority. 

In  the  Australian  IT  projects  sample, 
SPI(t)  stability  was  achieved  first  for  the 
preponderance  of  projects.  For  the  other 
data  samples,  the  achievement  of  cost  or 
schedule,  stability  first  occurred  in  rough¬ 
ly  equal  proportion.  In  only  one  project 
in  these  samples  —  an  Australian  IT  pro¬ 
ject  —  was  the  cost  and  schedule  stability 
achieved  simultaneously. 

Corroboration  With  Other 
Research 

Because  of  the  comprehensive  contradic¬ 
tion  to  the  previously  published  CPI  sta¬ 
bility  research  findings,  a  further  literature 
review  was  undertaken.  This  review 
obtained  a  most  unexpected  source  of 
independent  corroboration  for  this  arti¬ 


cle’s  CPI  stability  findings.  In  the  mid-90s, 
Michael  Popp,  a  civilian  employee  of  the 
U.S.  Naval  Air  Command  (NAVAIR),  ini¬ 
tiated  an  internal  DoD  research  project 
within  NAVAIR. 

The  output  was  an  internal  but 
unclassified  NAVAIR  report  (the  Popp 
report)  which  has,  with  Popp’s  permis¬ 
sion,  now  been  placed  into  the  public 
domain  on  the  PMI  Sydney  Chapter  Web 
site  [22].  The  purpose  of  the  Popp  study 
was  to  develop  probability  distributions 
of  cost  EACs  based  on  the  CPI  at  com¬ 
plete,  current  CPI,  and  percentage  com¬ 
plete  of  projects  based  on  history.  As 
stated  in  the  report:  Given  a  program  has 
a  CPI  of  X  and  a  percent  complete  of  Y, 
what  is  the  most  likely  finishing  CPI  [22]? 

In  contrast  to  Christensen  and  associ¬ 
ates  research,  which  used  data  from  the 
DAES  database,  the  data  used  by  Popp 
was  sourced  from  the  Contracts  Analysis 
System  database  maintained  by  the  Office 
of  the  Secretary  of  Defense  Cost 
Analysis  Improvement  Group. 


The  research  undertaken  by  Popp  did 
not  focus  on  CPI  stability.  However, 
charts  which  can  also  be  used  for  assess¬ 
ing  CPI  stability  were  completed  as  part 
of  that  study.  These  charts  correlate  the 
cumulative  CPI  for  the  percentage  com¬ 
plete  in  each  10  percent  complete  per¬ 
centile  band  to  the  CPEinai  for  all  projects 
in  that  sample. 

Figure  5  is  the  first  chart  of  interest 
from  the  Popp  report,  as  it  shows  the  cor¬ 
relation  between  the  cumulative  CPI  at 
1 0-20  percent  complete  and  the  CPI  Final 
for  aU  projects  in  the  sample. 

The  area  of  the  chart  enclosed  within 
the  dashed  lines  bounds  the  area  in  which 
the  correlation  plots  must  occur  for  the 
Christensen  derived  CPI  stability  rule  to 
apply.  Those  plots  which  occur  outside 
the  enclosed  area  are  also  in  conflict  with 
the  Christensen  derived  CPI  stability  rule. 
The  limited  data  samples  used  in  this 
analysis  are  sufficient  to  show  that  the 
CPI  stability  rule  cannot  be  generalized 
even  within  the  DoD  project  portfolio. 

While  research  using  the  Popp  report 
data  sample  was  not  principally  directed 
at  examining  the  validity  of  the  CPI  sta¬ 
bility  rule,  this  research  found  the  follow¬ 
ing: 

•  Development  programs  at  20  percent 
(completion),  programs  with  a  cumulative 
CPI  below  0.89  improve  which  was  close 
to  Christensen  (findings),  but  with  some 
exceptions. 

•  Production  programs  at  20  percent 
(completion),  programs  with  a  cumulative 
CPI  below  0.84  improve,  again  close  to 
Christensen  (findings),  but  with  some  excep¬ 
tions  [23]. 

Using  the  enclosure  technique.  Figure  6 
shows  that  the  preponderance  of  plots 
occur  within  the  area  where  the  CPI  sta¬ 
bility  rule  applies  at  20  percent  comple¬ 
tion.  The  conclusion  is  that  for  the  DoD 
project  data  used  by  Popp,  CPI  stability 
was  also  achieved  very  late  in  the  project 
life  cycle,  often  as  late  as  70-80  percent 
completion.  This  finding  is  consistent 
with  the  late  CPI  stability  findings  for  the 
commercial  sector  project  samples  as 
shown  in  Figure  1 . 

While  the  underlying  data  was  not 
available  and  further  research  is  required, 
these  findings  also  conflict  with  the  DoD 
research  findings  quoted  in  the  Beach 
report  into  the  A- 12  cancellation: 

DoD  experience  in  more  than  400 
programs  since  1977  indicates 
without  exception  that  the  cumu¬ 
lative  CPI  does  not  significantly 
improve  during  the  period 
between  15%  and  85%  of  contract 


Figure  4:  Summary  of  Projects  A.chieving  SPI(t)  or  CPI  Stability  First 
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Figure  5:  Correlation  between  Cumulative  CPI  at  10-20  Percent  Complete  and  Pinal  CPI  (Popp) 


performance;  in  fact,  it  tends  to 
decline  [1]. 

Some  projects  in  the  Popp  sample  show  a 
trend  of  CPI  performance  improvement, 
from  CPLo  percent  and  in  a  smaller  number 
of  cases,  as  late  as  CPLo  percent  to  CPLinai. 

Summary  and  Conclusion 

The  initial  objective  of  this  article  — 
ascertaining  whether  the  SPI(t)  demon¬ 
strates  similar  stability  characteristics  to 
those  extensively  reported  for  CPI  —  was 
not  achieved.  This  article  has  found  that 
while  the  behavior  of  the  SPI(t)  is  broad¬ 
ly  consistent  with  CPI,  the  widely  report¬ 
ed  CPI  stability  rule  cannot  be  general¬ 
ized  to  all  projects  using  the  EVM 
method  or  even  within  the  DoD  project 
portfolio.  However,  the  consistent  behav¬ 
ior  to  CPI  demonstrated  by  the  SPI(t) 
provides  further  support  for  the  validity 
of  the  SPI(t)  metric  and  the  ES  method. 

Additional  analysis  was  unable  to 
establish  a  correlation  between  achieving 
earlier  CPI  and  the  SPI(t)  stability  and 
improved  outcomes  at  completion.  In 
cases  where  projects  achieved  either 
under  budget  and/or  early  finish  out¬ 
comes  with  cost  and/or  schedule  stability 
achieved  late  (i.e.,  after,  say,  60  percent 
completion),  earlier  cost  and/or  schedule 
stability  would  have  been  disadvanta¬ 
geous  to  the  actual  Enal  outcome (s) 
achieved.  This  is  because  CPI  and/or  the 
SPI(t)  were  progressively  improving  over 
the  life  of  those  projects. 

The  findings  and  corroboration  of 
this  article  require  significant  review  and 
revision  to  what  has  been  regarded  as  a 
long  settled  EVM  heuristic  with  regard  to 
CPI  stability  and  consequent  practice 
including  the  use  of  a  stable  CPI  as  evi¬ 
dence  that  an  EVM  system  is  functioning 
properly  and  of  a  reliable  EAC  [5] . 

Improvements  to  current  EVM  tech¬ 
niques  for  predicting  future  cost  perfor¬ 
mance  should  be  considered  as  current 
techniques  have  relied  on  generalizing 
research  Endings  from  limited  data 
sources,  principally  the  DAES  database. 

Alternatives  methods  of  cost  and 
schedule  prediction  using  well-established 
statistical  principles  and  methods  devel¬ 
oped  by  Lipke  show  the  following 
promise: 

•  These  techniques  allow  generahon  of 
a  range  of  cost  and  schedule  predic¬ 
tions  from  user  defined  Confidence 
Limit(s). 

•  All  information  and  data  required  for 
these  predictions  comes  from  within 
the  project  itself 

This  may  reduce  the  current  depen¬ 


dence  on  heuristics  developed  from 
external  project  data  sources,  which 
might  not  be  applicable  to  the  project  of 
interest. 

To  promote  trials  of  these  statistical 
prediction  techniques,  a  freely  available 
calculator  can  be  found  on  the  ES  Web 
sitef  An  academic  article  fuUy  describing 
the  statistical  prediction  techniques  and 
supporting  rationales  is  pending  publica¬ 
tion  [24].  The  statistical  prediction  tech¬ 
niques  developed  have  been  summarized 
in  a  presentation  by  Henderson  which  is 
available  at  [25]. 

A  major  advance  to  EVM  practice  and 
future  research  opportunities  would  be 
development  of  a  broadly  based  EVM 
research  database  where  completed  EVM 
project  data  could  be  submitted  anony¬ 


mously  for  the  following: 

•  Researching  purposes. 

•  Benchmarking  completed  project  per¬ 
formance. 

•  Assisting  in  the  sizing  of  projects. 
Such  knowledge  bases  are  not  unique 

in  other  disciplines,  with  an  instructive 
Australian  example  being  the  ISBSG  Web 
site  at  <www.isbsg.org>. 

Improved  data  collection  techniques 
to  ensure  that  baseline  schedule  informa¬ 
tion  is  captured  and  stored  in  the  DAES 
database  are  also  recommended. 

Final  Remarks  and  Future 
Research 

While  this  article  has  overturned  long¬ 
standing  findings  and  beliefs  on  CPI  sta¬ 
bility,  it  is  important  that  the  strengths  and 


Figure  6:  Correlation  Between  Cumulative  CPI  at  70-80  Percent  Complete  and  Pinal  CPI  (Popp) 
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limitations  of  the  EVM  method  are  prop¬ 
erly  understood,  particularly  in  the  follow¬ 
ing  areas: 

•  Adoption  of  EVM  by  U.S.  govern¬ 
ment  agencies  through  the  Office  of 
Management  Budget  Circular  A- 11 
Part  7  mandate. 

•  Advocacy  of  the  use  of  EVM  cost 
predictors  to  assess  compliance  to  the 
Sarbanes-Oxley  Act  [9]. 

•  Increased  interest  and  the  adoption  of 
EVM  by  organizations  globally 
Where  projects  have  not  exhibited  CPI 

stahili^,  EVM  practitioners  can  now  know 
that  this  is  neither  unique,  nor  is  it  neces¬ 
sarily  an  adverse  reflection  on  the  man¬ 
agement  or  execution  of  those  projects. 

Various  follow-on  research  opportuni¬ 
ties  arise  from  this  article,  which  may 
develop  improved  understanding  of  pro¬ 
ject  performance  characteristics  and  gen- 
eralizable  heuristics.  Suggestions  include 
examining  the  performance  characteristics 
of  projects  where  the  following  happens: 

•  The  CPI  stability  rule  does  seem 
applicable  (e.g.,  the  subset  highlighted 
in  the  Popp  report  data)  to  determine 
whether  there  are  project  characteris¬ 
tics  which  result  in  early  CPI  stability. 

•  Early  CPI  stability  was  not  achieved 
due  to  progressively  improving  CPI 
performance  over  the  project  life 
cycle. 

Academically  oriented  research  aimed 
at  establishing  a  theoretical  rationale  for 
project  performance  instability  would  be 
another  useful  addition  to  the  project 
management  body  of  knowledge. 

While  [23]  provides  the  sobering 
assessment  consistent  with  Christensen’s 
findings  average  to  good  programs  do  not 
improve,  an  understanding  of  project  char¬ 
acteristics,  which  result  in  progressively 
improving  CPI  would,  if  these  character¬ 
istics  could  be  emulated  in  other  pro¬ 
grams,  be  an  extremely  useful  advance  to 
practice.  Such  research  could  offer  signifi¬ 
cant  opportunities  for  tangibly  improving 
project  performance. 

Research  opportunities  are  equally 
applicable  to  project  schedule  perfor¬ 
mance.  This  article  also  demonstrates  that 
by  using  ES,  research  of  schedule  perfor¬ 
mance  using  EVM  data  is  possible  and 
already  leading  to  improved  understanding 
of  the  dynamics  of  project  schedule  and 
project  cost  performance.^ 
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Notes 

1.  Unless  otherwise  stated,  aU  references 
to  CPI  and  the  SPI(t)  refer  to  the 
cumulative  values. 

2.  Project  has  been  used  consistently 
throughout  this  article.  In  the  US.  gov¬ 
ernment,  particularly  the  DoD  con¬ 
text,  program  may  be  the  more  appro¬ 
priate  term. 

3.  This  calculator  has  been  placed  into 
the  public  domain  to  encourage  more 
broadly  based  CPI  and  SPI(t)  stability 
research  and  is  freely  available  from 
the  ES  Web  site  at  <www. earned 
schedule.com/ Calculator.  shtml>. 

4.  Applying  the  Sign  Test  at  0.05  level  of 
significance  means  that  the  test  is 
being  applied  at  a  95  percent  level  of 
confidence. 
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Schedule  Adherence: 

A  Useful  Measure  for  Project  Management 

Walt  Lipke 
PMI  Oklahoma  City  Chapter 

Earned  Value  Management  (EVM)  is  a  very  good  method  of  project  management.  However,  EVM  hj  itself  cannot  pro¬ 
vide  information  as  to  how  the  schedule  is  being  accomplished.  Project  accomplishment  not  in  accordance  with  the  planned 
schedule  frequently  has  adverse  repercussions;  cost  increases  and  duration  is  elongated.  Thus,  managers  have  a  need  to  more 
fully  understand  project  performance.  This  article  utilir^s  the  new  practice  of  Earned  Schedule  (ES)  to  discuss  a  proposed 
measure  for  further  enhancing  the  practice  of  EVM.  The  measure,  Schedule  Mdherence,  provides  additional  early  warning 
information  to  project  managers,  thereby  enabling  improved  decision  making  and  enhancing  the  probability  of  project  success. 


Development  of  a  plan  for  executing  a 
project  is  a  difficult  undertaking. 
When  the  plan  is  being  created,  a  work 
flow  is  envisioned  along  with  constraints 
and  resource  availability.  There  is  a  con¬ 
siderable  amount  of  effort  invested  in 
decomposing  the  constituents  of  the  plan 
into  manageable  components  and  work 
packages.  Detailed  examination  of  the 
tasks  themselves  is  made  to  prepare  rea¬ 
sonable  estimates  for  their  cost  and  dura¬ 
tion.  Oftentimes,  planning  teams  use  his¬ 
torical  project  records,  heuristics,  and  sta¬ 
tistical  algorithms  to  determine  best  and 
worst  case  probable  outcomes. 
Furthermore,  to  assure  that  the  best  pos¬ 
sible  plan  is  created,  technical  experts  may 
be  employed  to  make  the  estimates  as 
accurate  as  possible. 

Before  assignments  can  be  made  to  the 
team  members  of  a  project,  the  timing  of 
their  actions  must  be  known  along  with 
their  interdependencies.  The  intricate 
mechanism  for  consolidating  all  of  this 
information  and  making  it  understandable 
to  the  project  team  and  senior  manage¬ 
ment,  as  well,  is  the  schedule.  The  schedule 
is  an  embodiment  of  our  best  understand¬ 
ing  of  how  to  accomplish  the  project  ...  a 
truly  important  document.  Possibly,  the 
schedule  is  the  single  most  important  doc¬ 
ument  pertaining  to  the  project,  and  it 
likely  has  more  to  do  with  success  than 
any  other  aspect. 

Well,  then,  if  the  planned  schedule  is 
so  crucial  to  project  success,  it  follows  that 
project  managers  should  do  their  utmost 
to  ensure  project  execution  conforms  to 
it.  Assuming  the  planned  schedule  is  the 
most  efficient  path  for  executing  the  pro¬ 
ject,  any  deviation  leads  to  inefficiency  and 
very  likely  other  problems  such  as  con¬ 
straint  reduced  production,  idle  time,  skills 
mismatch,  and  poor  quality  output,  and  in 
turn,  requires  rework.  Thus,  there  is  an 
extremely  compelling  case  for  following 
the  planned  schedule. 

®  Capability  Maturity  Model  is  registered  in  the  U.S.  Patent 
and  Trademark  Office  by  Carnegie  Mellon  University 


This  article  presents  a  proposed 
method  for  measuring  the  conformance, 
or  adherence,  for  the  schedule  execution 
of  a  project.  Utilizing  the  method  and 
measure,  the  project  manager  has  a  better 
understanding  of  how  well  the  execution 
follows  the  sequence  and  precedence  of 
the  tasks  in  the  baseline  schedule.  Having 
an  indicator  for  schedule  adherence  provides 
additional  early  warning  information  for 
managers  to  act  upon. 

Schedule  Performance 
Efficiency  Versus  Schedule 
Adherence 

What  is  meant  by  schedule  adherence}  Does  it 
mean  that  the  project  is  performing  such 
that  objectives  are  achieved  at  the  time 
predicted  or  planned?  Certainly  project 
managers  want  to  know  that  interim  prod¬ 
ucts  are  being  produced  and  delivered  on 
time.  This  type  of  schedule  performance 
indicator  can  be  made  a  number  of  differ¬ 
ent  ways,  such  as  portion  or  percent  of 
milestones,  objectives,  or  interim  products 
achieved  on  time.  In  fact,  the  EVM 
Schedule  Performance  Indicator  (SPI)  is 
of  this  typef  However,  SPI  is  much  more 
resolute  than  the  very  coarse  measures 
mentioned;  its  increment  of  measure  is 
cost  -  earned  and  planned.  This  discus¬ 
sion  for  SPI  is  equally  applicable  to  the 
time-based  schedule  performance  effi¬ 
ciency  indicator  from  ES,  SPI(t)k 

All  of  these  indicators,  including  SPI 
and  SPI(t),  describe  the  efficiency  of 
achieving  the  plan.  However,  they  do  not 
provide  information  about  how  the  prod¬ 
ucts,  milestones,  objectives,  or  earned 
value  were  achieved.  For  example,  these 
indicators  cannot  describe  whether  or  not 
completion  of  milestone  2  followed  mile¬ 
stone  1.  If  the  milestone  schedule  indi¬ 
cates  that  at  status  period  3  we  should 
have  completed  two  milestones  and  we 
have  completed  two,  it  would  appear  from 
the  indicator  (milestone  percent  complet¬ 
ed  =  100  percent),  that  aU  is  well.  But  what 
if  the  two  milestones  are  numbers  one 


and  three  while  the  second  milestone  is 
stiU  in  work?  Is  there  anything  possibly 
wrong?  After  ah,  the  project  has  met  its 
two-milestone  objective. 

For  the  EVM  schedule  efficiency  indi¬ 
cator,  SPI,  there  is  no  concern  as  to 
whether  the  earned  value  (EV)  accrued 
matches  the  expectation  of  the  schedule. 
In  most  cases,  project  managers  would 
celebrate  an  SPI  =  1.0  because  it  is  so  sel¬ 
dom  achieved,  and  consequently  would 
not  question  whether  the  EV  accrued  is, 
in  fact,  the  expected  planned  value  (PV). 
Again,  the  question  is  raised:  Should  the  pro¬ 
ject  manager  be  concerned  with  the  performance 
sequence,  i.e.,  how  the  achievement  occurred} 
Does  it  make  any  difference? 

Over  the  last  20  years,  nearly  every 
industry  experienced  several  initiatives 
intended  to  improve  project  performance 
and  product  quality:  Statistical  Process 
Control,  Total  Quality  Management,  the 
Software  Engineering  Institute  Capability 
Maturity  Model®,  and  the  International 
Organization  Standard  for  Quality 
Management  Systems  9001.  The  funda¬ 
mental  idea  from  ah  of  these  process 
improvement  efforts  is  the  following: 
Undisciplined  execution  leads  to  inefficient  perfor¬ 
mance  and  defective  products. 

Does  this  thinking  apply  to  project 
plans,  too?  Of  course  it  does;  the  planned 
schedule  describes  the  execution  process. 
Therefore,  it  is  not  enough  to  measure  the 
execution  efficiency.  Additionally,  project 
managers  (PM)  need  to  know  how  well 
the  process  is  being  followed.  By  main¬ 
taining  process  integrity,  PMs  can  maxi¬ 
mize  the  project’s  performance  and  mini¬ 
mize  its  rework  and  delivery  of  defective 
products.  An  indicator  for  adherence  to 
the  schedule  provides  the  measure  needed 
by  PMs  for  monitoring  and  controlling  the 
project  execution. 

Measuring  and  Indicating 
Schedule  Adherence 

The  idea  for  measuring  schedule  adher¬ 
ence  is  simply  stated  in  this  question:  Did 
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the  accomplishment  match  exactly  the  expectation 
from  the  planned  schedule'^  This  is  not  the 
same  as  the  preceding  discussion  of 
schedule  performance  efficiency,  where 
the  volume  of  actual  work  accomplished 
is  compared  to  the  expected  volume  from 
the  schedule.  Schedule  adherence  is  a 
more  restrictive  measure,  and  it  is  inde¬ 
pendent  from  performance  efficiency. 

A  recent  enhancement  to  EVM,  ES, 
provides  a  means  to  measure  schedule 
adherence.  ES  is  derived  from  two  mea¬ 
sures  of  EVM,  PV,  and  EV  [1].  The  accu¬ 
mulated  planned  value  from  the  project 
start  to  its  planned  completion  is  the  per¬ 
formance  measurement  baseline  (PMB) 
[2].  ES  is  the  time  duration  associated  with 
the  PMB  where  the  PV  is  equal  to  the  EV 
accrued. 

The  concept  of  ES  is  illustrated  by 
Figure  1.  Arrow  A  projects  the  accrued 
value  of  EV  onto  the  PMB  to  identify  the 
point  at  which  PV  equals  EV.  Arrow  B 
identifies  the  time  at  which  PV  equals  the 
EV  accrued,  i.e.,  the  planned  duration 
earned  or  ES.  The  time  at  which  the  EV 
accrued  appears  is  period  seven.  Whereas 
ES  is  determined  to  be  the  duration  of 
five  periods;  i.e.,  the  time  measure  from 
the  PMB  where  PV  is  equal  to  the  EV 
accrued  at  Time  Now,  or  Actual  Time 
(AT). 

Two  comparative  measures,  SVt  and 
SVc,  are  shown  in  the  diagram  to  illustrate 
the  difference  between  the  cost-based  and 
time-based  indicators  of  EVM  and  ES, 
respectively.  The  traditional  EVM  sched¬ 
ule  variance  is  SVc,  while  the  time-based 
schedule  variance  from  ES  is  SVf.  From 
the  numbers  shown  in  the  diagram,  SVt 
can  be  easily  computed:  SVt  =  ES  —  AT  = 
5-7  =  -2.  Assuming  the  units  are  months, 
the  project  is  two  months  behind  its 
planned  schedule. 

The  performance  expectation  for  the 
planned  schedule  is  embodied  in  the  PMB. 
This  is  a  consequence  of  the  PMB  being 
the  result  from  summing  time  phased  PV 
across  all  tasks  in  the  schedule.  Figure  2  is 
used  to  illustrate  the  relationship.  The  fig¬ 
ure  shows  a  network  schedule  at  the  top 
with  the  PV  curve  beneath  it. 

The  connection  between  EVM  and 
the  schedule  provided  by  ES  is  remark¬ 
able.  Regardless  of  the  project’s  actual 
position  in  time,  we  have  information 
describing  the  portion  of  the  planned 
schedule,  which  should  have  been  accom¬ 
plished.  That  is,  for  a  claimed  amount  of 
EV  at  a  status  point  AT,  the  portion  of 
the  PMB  which  should  be  accomplished  is 
identified  by  ES.  Another  way  of  describ¬ 
ing  this  relationship  is  the  value  of  ES 
indicates  where  the  task  performance  of 


the  project  should  be  for  that  amount  of 
duration  of  the  planned  schedule.  As 
shown  by  Figure  2,  specific  tasks  make  up 
that  portion  of  the  schedule.  The  darker 
shaded  areas  of  the  task  blocks  indicate 
the  portions  planned  to  be  completed.  If 
the  schedule  is  adhered  to  we  will  observe 
in  the  actual  performance  the  identical 
tasks  at  the  same  level  of  completion  as 
the  tasks  which  make  up  the  plan  portion 
identified  by  ES.  By  adhering  to  the 
planned  sequence  of  tasks,  the  manager  is 
assured  during  project  execution  that  the 
predecessors  to  the  tasks  in  work  are  com¬ 
plete. 

It  is  more  than  likely  the  project  is  not 
performing  synchronously  with  the  sched¬ 
ule;  EV  is  not  being  accrued  in  accordance 
with  the  plan.  As  seen  in  Figure  3  (see 
page  16),  the  accumulated  EV  is  the  same 
quantity  depicted  in  Figure  2,  but  its  task 


distribution  is  different.  Figure  3  is  a 
graphical  illustration  of  the  earlier  discus¬ 
sion  of  the  reasons  for  process  discipline. 
The  lagging  performance  for  tasks  to  the 
left  of  ES  indicates  the  possibility  of  a 
constraint  or  impediment.  Performance 
may  be  lagging  behind  the  expectation  due 
to  something  preventing  it  from  occur¬ 
ring.  The  EV  indicated  to  the  right  of  ES 
shows  tasks  performed  at  risk;  they  will 
likely  have  significant  rework  appearing 
later  in  the  project. 

Both  sets  of  tasks,  lagging  and  ahead, 
cause  poor  efficiency.  Of  course,  for  the 
lagging  tasks,  impediments  and  con¬ 
straints  make  progress  more  difficult. 
Concentrating  management  efforts  on  alleviating 
the  impediments  and  constraints  will  have  the 
greatest  positive  impact  on  project  performance. 

The  darkened  tasks  to  the  right  of  ES 
indicate  performance  resulting  from 


April  2008 


www.stsc.hill.af.mil  I  5 


Project  Tracking 


Figure  3:  A^ctual Distribution  of  EV 


impediments  and  constraints  or  poor 
process  discipline.  Frequently,  they  are 
executed  without  complete  information. 
The  performers  of  these  tasks  must  nec¬ 
essarily  anticipate  the  inputs  expected 
from  the  incomplete  preceding  tasks;  this 
consumes  time  and  effort  and  has  no 
associated  EV.  Because  the  anticipated 
inputs  are  very  likely  misrepresentations 
of  the  future  reality,  the  work  accom¬ 
plished  (EV  accrued)  for  these  tasks  usu¬ 
ally  contains  significant  amounts  of 
rework.  Complicating  the  problem,  the 
rework  created  for  a  specific  task  will  not 
be  recognized  for  a  period  of  time.  The 
need  for  rework  will  not  be  apparent  until 
all  of  the  inputs  to  the  task  are  known  or 
its  output  is  recognized  to  be  incompatible 
with  the  requirements  of  a  subsequent 
task. 

This  conceptual  discussion  leads  to  the 
measurement  of  schedule  adherence.  By 
determining  the  EV  for  the  actual  tasks 
performed  congruent  with  the  project 
schedule,  a  measure  can  be  created.  The 
adherence  to  schedule  characteristic,  P,  is 
described  mathematically  as  a  ratio: 


PVj  represents  the  PV  for  a  task  asso¬ 
ciated  with  ES.  The  subscript  j  denotes  the 
identity  of  the  tasks  from  the  schedule 
which  comprise  the  planned  accomplish¬ 
ment.  The  sum  of  all  PVj  is  equal  to  the 
EV  accrued  at  AT.  EVj  is  the  EV  for  the  j 
tasks,  limited  by  the  value  attributed  to  the 
planned  tasks,  PVj. 

Consequently,  the  value  of  P  repre¬ 
sents  the  proportion  of  the  EV  accrued 
which  exactly  matches  the  planned  sched¬ 
ule. 

Recall,  the  question  with  which  we 
began,  did  the  accomplishment  match  exactly  the 
expectation  from  the  schedule'^  The  P-Factor 
answers  the  question  and  thus  is  the  per¬ 
formance  indicator  of  schedule  adherence 
sought  after. 

A  characteristic  of  the  P-Factor  is  that 
its  value  must  be  between  zero  and  one;  by 
definition,  it  cannot  exceed  one.  A  second 
characteristic  is  that  P  will  exactly  equal 
1.0  at  project  completion.  P  equal  to  zero 
indicates  that  the  project  accomplishment 
thus  far  is  not,  at  all,  in  accordance  with 
the  planned  schedule.  Conversely,  P  equal 
to  one  indicates  perfect  conformance. 

When  the  value  for  P  is  much  less  than 
1.0,  i.e.,  poor  schedule  adherence,  the  pro¬ 


ject  manager  has  a  strong  indication  the 
project  is  experiencing  an  impediment,  the 
overload  of  a  constraint,  or  there  is  poor 
process  discipline.  Conversely,  when  the 
value  of  P  is  very  close  to  1.0,  the  PM  can 
feel  confident  the  schedule  is  being  fol¬ 
lowed  and  that  milestones  and  interim 
products  are  occurring  in  the  proper 
sequence.  The  PM  thus  has  an  indicator 
derived  from  ES  which  further  enhances  the 
description  of  project  performance  portrayed  by 
EVM  alone. 

Example  Application 

Table  1  contains  notional  data  that  relates 
to  Figure  3.  The  task  numbers  from  the 
table  are  identified,  as  well,  in  the  network 
diagram  of  the  figure.  The  total  PV  for 
the  hypothetical  project  is  62  units.  The 
total  EV  accrued  at  AT  is  40  units;  the 
task  distribution  of  EV  is  beneath  the  col¬ 
umn  heading,  EV  at  AT.  The  task  distrib¬ 
ution  of  PV  for  the  ES  duration  is  shown 
in  the  PV  at  ES  column. 

By  calculating  the  difference,  EV 
minus  PV,  between  the  two  distribution 
columns,  we  can  determine  which  tasks 
may  have  impediments  or  where  a  con¬ 
straint  has  developed.  Those  tasks  are 
identified  by  the  negative  values  in  the 
EV-PV  column  and  recorded  as  a  possible 
impediment  or  constraint  (I/C)  in  the  last 
column  of  Table  1 ;  they  are  tasks  2,  4,  and 
6.  The  PM  should  investigate  those  three 
tasks  for  removal  of  impediments  or  alle¬ 
viation  of  the  constraints. 

Should  no  impeding  problem  be 
found,  the  PM  has  reason  to  suspect  inap¬ 
propriate  performance  by  members  of  the 
project  team,  i.e.,  poor  process  discipline. 
It  may  be  discovered  that  a  person 
assigned  one  of  the  tasks  identified  is 
insufficiently  skiUed  or  trained.  This  never 
happens,  does  it?  The  employee,  in  order  to 
maintain  a  satisfactory  efficiency  for  his 
performance  review,  executed  a  down¬ 
stream  task  because  it  was  something  he 
knew  how  to  do.  (Tor  this  example,  the 
employee  is  compelled  to  do  the  wrong  thing.  Eet 
us  hope  that  management fully  examines  the  prob¬ 
lem  and  recognir^es  its  own  culpability.) 

The  column,  EV-PV,  also  indicates 
positive  differences  for  three  tasks:  5,  7, 
and  8.  These  tasks  are  not  being  per¬ 
formed  synchronously  with  the  schedule 
and  are  at  risk  of  generating  rework,  as 
indicated  by  the  letter  R  recorded  in  the 
table.  It  is  obvious  from  Figure  3  that 
tasks  7  and  8  are  at  risk  because  some  or 
all  of  the  required  inputs  to  them  are 
absent.  However,  the  risk  of  task  5  is  not 
so  obvious;  all  of  its  required  inputs  are 
available.  With  respect  to  ES,  it  should  be 
only  partially  complete.  Task  5  completion 


P  =  X  EVj  /  X  PVj 


Table  1 :  Schedule  Adherence  Example 


Task 

PV 

PV  at  ES 

EV  at  AT 

EV-PV 

I/C  or  R 

1 

10 

10 

10 

0 

2 

12 

9 

5 

-4 

I/C 

3 

10 

10 

10 

0 

4 

5 

5 

3 

-2 

I/C 

5 

5 

2 

5 

+3 

R 

6 

8 

4 

3 

-1 

I/C 

7 

7 

0 

1 

+1 

R 

8 

5 

0 

3 

+3 

R 

Total 

62 

40 

40 

0 
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Figure  4:  Vroject  Management  Indicators 


is  not  synchronous  with  the  planned  exe¬ 
cution  at  the  ES  duration.  Rework  can  be 
generated  in  this  case  as  well  —  it  is  never 
wise  to  be  too  far  out  in  front. 

To  further  explain,  as  the  project  pro¬ 
gresses  the  detail  for  task  accomplishment 
becomes  much  clearer.  Oftentimes  subtle 
changes  to  task  requirements  are  made 
due  to  the  learning  gained  during  the 
development  process  from  the  prior  task 
accomplishment.  By  working  ahead,  the 
developer  unknowingly  makes  the  pre¬ 
sumption  that  his  work  is  unaffected  by 
the  other  facets  of  the  project.  When  this 
occurs,  the  task  worker  is  not  performing 
synchronously  with  the  plan  and  the  risk 
of  rework  is  created. 

What  is  the  value  of  the  P-Factor  for 
this  example?  From  review  of  the  PV  at 
ES  column,  the  tasks  to  be  included  in  the 
calculation  are  1  through  6;  the  sum  of  PV 
at  ES  equals  40.  The  sum  of  the  EVs  in 
agreement  with  the  PVs  is  found  from  the 
values  of  tasks  1  through  6  in  the  EV  at 
AT  column.  The  sum  of  the  values  for 
these  tasks  is  36.  However,  recall  task  5  is 
three  units  ahead  of  where  it  should  be 
with  respect  to  the  amount  of  PV  planned 
for  that  point  in  time.  Subtracting  the 
three  units,  the  EV  sum  in  agreement  with 
the  schedule  equals  33.  As  can  be  seen, 
another  way  to  calculate  the  EV  in  agree¬ 
ment  is  to  add  the  sum  of  the  negative 
entries  in  the  EV-PV  column  to  the  total 
EV  accrued;  i.e.,  40  +  (-  7)  =  33.  P  can 
now  be  calculated  as  follows: 

P  =  X  EVj  /  X  PVj  =  33  /  40  =  0.825 

Thus,  approximately  80  percent  of  the 
execution  is  in  conformance  with  the 
schedule. 

Let  us  presume  all  of  the  claimed 
accomplishment  not  in  schedule  confor¬ 
mance  requires  rework,  seven  units.  For 
this  worst  case,  nearly  18  percent  of  the 
claimed  EV  must  be  re-accomplished  for 
the  project  to  complete  satisfactorily. 
Unless  this  project  has  considerable 
reserves,  successful  completion  within  the 
allocated  resources  is  very  unlikely  It  is 
obvious;  the  manager  for  this  project  has 
work  to  do.  However,  without  the  P- 
Factor  indicator  and  the  analysis,  it  is  not 
so  obvious  as  to  what  he  should  investi¬ 
gate  and  take  action  to  correct. 

Real  Data 

Figure  4  is  a  graph  of  the  indicators,  cost 
performance  index  (CPI),  SPI(t),  and  the 
P-Factor  from  real  project  data.  For  the 
figure,  CPI  is  the  CPI  from  EVM  and  the 
Percent  Complete  of  the  x-axis  is  deter¬ 
mined  from  EV  divided  by  the  Budget  at 


Completion  (BAC)  [1].  As  you  can  see,  the 
schedule  adherence  (P-Factor)  is  extreme¬ 
ly  high,  even  from  the  beginning;  at  20 
percent  complete,  P  is  equal  to  0.93.  The 
fact  that  the  P-Factor  is  very  nearly  1.0 
says  that  the  precedence  of  the  schedule  is 
followed  very  closely  throughout  the  peri¬ 
od  of  execution  shown. 

Also  observed  is  the  curve  fit  of  the  P- 
Factor  data  points.  The  curve  fit  is  an  illus¬ 
tration  of  the  previous  discussion  of  the 
behavior  of  P:  as  the  project  percent  com¬ 
plete  increases,  in  general  the  value  of  P 
will  approach  1.0;  at  completion,  P  =  1.0. 
This  behavior  is  observed  with  the  curve 
fit  line. 

The  plots  of  CPI  and  SPI(t)  indicate  a 
very  high  performing  project;  CPI  hovers 
around  1.05,  while  SPI(t)  is  generally 
greater  than  0.98.  The  forecast  for  the 
project  outcome  is  expected  to  complete 
under  budget  and  slightly  past  its  planned 
completion  date.  A  logical  conjecture 
from  the  comparison  of  the  indicators  is 
that  when  the  planned  schedule  is  closely 
followed,  output  performance  is  maxi¬ 
mized,  and  the  project  has  the  greatest 
opportunity  for  success.  In  other  words, 
when  P  is  a  high  value,  we  can  expect  CPI 
and  SPI(t)  to  be  high,  as  well.  Although 
this  relationship  needs  verification  from 
further  research,  the  rationale  appears  rea¬ 
sonable. 

Summary 

ES  is  a  measure  shown  over  the  last  four 
years  of  application  and  research  exami¬ 
nation  to  provide  reliable  schedule  perfor¬ 
mance  indicators,  further  enabling  dura¬ 
tion  and  completion  date  forecasting.  In 
this  article,  the  application  of  ES  is 


extended,  thereby  facilitating  identifica¬ 
tion  of  those  tasks  which  should  have 
been  accomplished  for  the  EV  accrued. 
From  the  comparison  of  the  actual  distri¬ 
bution  of  the  EV  to  its  planned  distribu¬ 
tion,  it  is  shown  that  useful  information  is 
available  to  project  managers  concerning 
possible  impediments  or  constraints  along 
with  the  identification  of  potential  future 
rework. 

The  measure  for  indicating  how  weU 
the  project  is  following  its  planned  sched¬ 
ule  is  Schedule  Adherence,  i.e.,  the  P- 
Factor.  Adhering  to  the  planned  sequence 
of  tasks,  assures  that  the  predecessors  to 
the  tasks  in  work  are  complete  thereby 
minimizing  the  potential  for  rework.  The 
P-Factor  enhances  project  control  capabil¬ 
ity  by  providing  additional  early  warning 
information.  When  employed  with  SPI(t) 
from  ES  and  CPI  from  traditional  EVM, 
the  P-Factor  yields  more  complete  project 
performance  information.  In  turn,  the 
added  measure  enhances  management 
decision  making,  and  the  probability  for 
successful  project  outcomes. 

Final  Remarks 

Some  practitioners  of  EVM  hold  to  the 
belief  that  schedule  analysis  can  be 
accomplished  only  through  detailed  exam¬ 
ination  of  the  network  schedule.  They 
maintain  the  understanding  and  analysis 
of  task  precedence  and  float  within  the 
schedule  cannot  be  accounted for  hj  an  indicator. 
However,  detailed  schedule  analysis  is  a 
burdensome  activity  and  if  performed 
often  can  have  disrupting  effects  on  the 
project  team. 

ES  offers  calculation  methods  yielding 
reliable  results,  which  greatly  simplify  final 
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duration  and  completion  date  forecasting. 
Furthermore,  as  described  in  this  article, 
the  development  of  ES  has  led  to  a  new 
and  potentially  powerful  indicator  of 
schedule  performance,  i.e..  Schedule 
Adherence. 

Future  research  of  the  proposed 
Schedule  Adherence  Indicator  is  encour¬ 
aged.  To  promote  experimentation  and 
usage  of  the  measure,  the  P-Factor  calcu¬ 
lator  is  made  available  for  download  at 
<www.earnedschedule.com/ Calculator. 
shtml>.^ 
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Notes 

1.  The  schedule  performance  indicator 
from  EVM  is  symbolized  by  SPL  SPI 
is  equal  to  the  EV  divided  by  the  PV  at 
a  specific  time;  i.e.,  SPI  =  EV  /  PV  [1]. 

2.  The  time-based  schedule  performance 
indicator  from  ES  is  SPI(t)  and  is  equal 
to  the  earned  schedule  divided  by  the 
actual  duration  (or  actual  time,  AT); 
i.e.,  SPI(t)  =  ES  /  AT  [2]. 

3.  The  EVM  and  ES  definitions  of  SVc 
and  SVt,  respectively,  are  as  follows: 
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A  Review  of  Boundary  Value  Analysis  Techniques 


Dr.  David  J.  Coe 
The  University  of  Alabama  in  Huntsville 

Software  testing  is  an  essential  element  of  any  software  development  effort.  Developers  must  have  some  means  of  selecting  tests 
to  evaluate  the  completeness  and  quality  of  product  produced.  This  article  reviews  Boundary  Value  Analysis  (BVA),  a  func¬ 
tional  testing  methodology  that  can  assist  in  the  identification  of  an  effective  set  of  tests. 


Software  testing  is  a  fundamental  soft¬ 
ware  engineering  activity  critical  to  a 
successful  development  effort.  In  fact,  an 
increasingly  popular  approach  to  software 
development  is  that  of  test-driven  develop¬ 
ment  in  which  tests  are  identified  and  doc¬ 
umented  prior  to  implementation  of  the 
code  [1].  The  test-driven  approach  to 
development  places  an  emphasis  on  the 
quality  of  the  resulting  product  by  estab¬ 
lishing  completeness  and  correctness  cri¬ 
teria  early.  A  major  challenge  to  any  testing 
effort  is  that  one  must  identify  a  set  of 
tests  that  are  effective  at  finding  defects 
while  keeping  the  resources  associated 
with  applying  those  tests  within  project 
cost  and  schedule  constraints. 

The  following  is  an  overview  of  BVA, 
a  systematic  methodology  for  identifying 
tests  to  apply.  In  the  following  discussion, 
the  term  test  case  refers  to  “a  set  of  inputs, 
execution  conditions,  and  expected  results 
developed  for  a  particular  objective,  such 
as  to  exercise  a  particular  program  path  or 
to  verify  compliance  with  a  specific 
requirement.”  A  test  is  defined  as  either  “a 
set  of  one  or  more  test  cases”  or  as  ‘fihe 
execution  of  the  test  cases.”  A  fault  is  ‘‘an 
incorrect  step,  process,  or  data  definition 
in  a  computer  program,”  and  a  failure  is  the 
“inability  of  a  system  or  component  to 
perform  its  required  functions  within 
specified  performance  requirements  [2].” 
Thus,  a  primary  goal  of  software  testing  is 
to  identify  failures,  which  indicate  the 
presence  of  one  or  more  faults  [3] . 

Overview  of  BVA 

BVA  is  a  black-box  approach  to  identify¬ 
ing  test  cases.  In  black-box  testing,  test 
cases  are  selected  based  upon  the  desired 
product  functionality  as  documented  in 
the  specifications  without  consideration 
of  the  actual  internal  structure  of  the  pro¬ 
gram  logic  [4].  A  fundamental  assumption 
in  BVA  is  that  the  majority  of  program 
errors  will  occur  at  critical  input  (or  out¬ 
put)  boundaries,  places  where  the 
mechanics  of  a  calculation  or  data  manip¬ 
ulation  must  change  in  order  for  the  pro¬ 


gram  to  produce  a  correct  result  [3]. 

An  example  that  illustrates  the  general 
concept  of  boundary  values  would  be  a 
program  that  calculates  income  tax  for  a 
given  income.  In  a  progressive  income  tax 
scheme,  the  tax  rate  applied  increases 
from  low-income  brackets  to  high-income 
brackets.  In  this  case,  the  critical  input 
boundaries  would  be  the  set  of  incomes  at 
which  the  applied  tax  rate  should  change 
along  with  any  minimum  or  maximum 
extremes  of  the  income  value.  Thus,  the 
set  of  boundary  incomes  defines  the  lim¬ 
its  of  each  tax  bracket. 

Test  Case  Selection  Using 
BVA 

The  set  of  test  cases  identified  by  BVA 
depends  upon  both  the  reliability  require¬ 
ments  of  the  software  under  test  and  the 
underlying  assumptions  on  the  likelihood 
of  single  versus  multiple  range  checking 


faults.  The  following  discussions  of  single- 
variable  and  multi-variable  BVA  are  derived 
from  the  BVA  taxonomy  and  discussion 
presented  in  [3]. 

Single-Variable  BVA 

The  baseline  procedure  for  BVA  begins  by 
identifying  the  boundary  values,  typically 
from  the  input  point  of  view.  All  of  these 
boundary  values  will  be  incorporated  into 
the  set  of  test  cases.  In  addition  to  those 
values,  values  near  the  boundaries  will  be 
tested.  These  boundary-adjacent  values 
will  help  to  exercise  the  program’s 
bounds-checking  logic.  For  example, 
when  testing  the  range  of  a  value  in  a 
branching  or  looping  statement,  the  devel¬ 
oper  may  use  a  less- than  operator,  ‘<,’ 
when  the  correct  operator  should  have 
been  a  less-than-or-equal  to  operator, 
‘<=,’  or  a  greater- than  operator,  ‘>,’  which 
is  adjacent  to  the  less-than  operator  on 
most  keyboard  layouts.  Such  errors  would 


Figure  1:  Baseline  BVA  Test  Cases  Identified for  Single-Variable,  Single-Tange  Txample  (Shaded 
area  indicates  valid  values  of  the  variable  N) 
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Figure  2:  Single-Variable,  Single-Tange  Baseline  Test  Cases  Augmented  With  Robustness  Tests 
(shaded  area  indicates  valid  values  of  the  variable  N) 
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Figure  3:  Single-Variable,  Two-Range  Test  Cases  Identified  by  Robust  BVA  (Highlighted  areas 
indicate  the  two  subranges  of  valid  values  of  M) 
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Figure  4  A,B,C:  Single-¥ault,  Baseline,  and  Kobust  Test  Cases  (A.)  assuming  N  is  at  its  nominal  value,  (B)  assuming  M  is  at  its  nominal  values  for  each 
subrange,  and  (C)  the  set  of  all  test  cases  identified  (derived  from  \5\) 


result  in  code  that  compiles  but  executes 
incorrectly  under  certain  conditions.  To 
test  for  these  types  of  errors,  values  adja¬ 
cent  to  the  boundary  values  must  be 
included  in  the  set  of  test  cases.  In  addi¬ 
tion  to  the  boundary  and  boundary-adja¬ 
cent  values,  the  baseline  BVA  procedure 
includes  some  nominal  value  of  input  (or 
output)  in  the  set  of  test  cases.  The  base¬ 
line  BVA  procedure  is  best  illustrated  by 
the  following  example. 

Consider  a  program  with  a  single  input 
variable  N  that  has  an  output  defined  only 
for  values  of  iV  in  the  range  a  C  N  C  c. 
The  set  of  test  cases  selected  would  be  at 
minimum  the  set  of  values  lA baseline  —  {a,  a+, 

b,  C-,  c}  where  a+  is  a  value  just  greater 
than  a,  c-  is  a  value  just  less  than  c,  and  the 
value  b  is  some  nominal  value  that  lies 
between  a+  and  c-.  In  this  example,  the 
baseline  BVA  procedure  identifies  five  test 
cases.  As  graphed  on  the  number  line  in 
Figure  1  (see  page  19),  the  test  cases 
selected  under  the  baseline  procedure  do 
not  exceed  the  allowable  range  of  inputs 
for  the  variable  N. 

If  error  handling  is  critical  to  the  soft¬ 
ware  under  test,  then  one  augments  the  set 
of  test  cases  identified  by  the  baseline 
BVA  procedure  to  include  robustness 
tests,  that  is,  values  outside  the  allowable 
range.  The  baseline  tests  identified  above 
are  augmented  with  the  values  {a-,  c+} 
where  a-  is  a  value  just  below  the  mini¬ 
mum  acceptable  value  a  and  c+  is  a  value 
just  above  the  maximum  acceptable  value 

c.  The  inclusion  of  the  values  {a-,  c+}  in 
the  set  of  test  cases  should  force  execu¬ 
tion  of  any  exception  handler  or  defensive 
code.  In  this  single-input,  single-range 
example,  robust  BVA  identifies  a  total  of 
seven  test  cases  as  shown  in  Figure  2  (see 
previous  page)  where  N robust  —  {a-,  a,  a+,  b, 
C-,  c,  c+}. 

The  baseline  BVA  or  robust  BVA  pro¬ 
cedures  may  also  be  applied  in  situations 
where  an  input  may  have  multiple  sub¬ 
ranges.  Consider  a  single  input  M  with  two 


adjacent  subranges  where  range  #1  is 
given  by  d  C  M  <f  and  range  #2  is  given 
by/  CMCih.  The  set  of  test  cases  would 
be  the  union  of  the  test  cases  identified  by 
applying  the  BVA  procedure  to  each  indi¬ 
vidual  subrange.  So,  the  union  of  test  cases 
resulting  from  the  application  of  baseline 
BVA  to  each  subrange  individually  is  given 
by  the  following: 

Mbaseline  =  {cl,  Cl+j  6,  f",  f}  U  {f,  f+j  Q,  h“j  h} 

=  {d,  d+,  e,  f-,  f,  f+,  g,  h-,  h} 

Application  of  robust  BVA  augments 
Mbaseline  with  the  extteme  values  {^d-,  h+}  to 
yield  Mrobrnt  —  {d-,  d,  d+,  e,  f,  f  fi\-,  g,  h-,  h, 
h+}  as  illustrated  in  Figure  3.  The  addition 
of  multiple  subranges  clearly  increases  the 
total  number  of  test  cases  identified.  For 
two  adjacent  subranges  of  a  single  vari¬ 
able,  baseline  BVA  identified  nine  test 
cases  and  robust  BVA  identified  11  test 
cases  total. 

Multi-Variable  BVA 

The  BVA  test  case  selection  procedure  for 
multi-variable  problems  also  requires  con¬ 
sideration  of  fault  likelihood,  what  I  refer 
to  as  a  fault  model.  Under  the  single-fault 
model,  it  is  assumed  that  a  failure  is  the 
result  of  a  single  fault  due  to  the  low 
probability  of  two  or  more  faults  occur¬ 
ring  simultaneously  [3].  For  the  multiple- 
fault  model,  one  assumes  that  the  likeli¬ 
hood  of  multiple  simultaneous  faults  is  no 
longer  insignificant,  and  thus  additional 
test  cases  must  be  selected  to  address  situ¬ 
ations  such  as  erroneous  range  checking 
on  multiple  variables  simultaneously. 

Drawing  from  our  previous  single  vari¬ 
able  examples,  assume  the  single- fault 
model  for  a  problem  that  has  two  inputs, 
N  and  M,  with  values  of  N  in  the  allow¬ 
able  range  a  <  N  ^  c  and  where  allowable 
values  of  M  span  range  #1,  given  by  d  C 
M  <f  and  range  #2,  given  by  f  C  M  C  h. 
From  our  previous  discussion,  the  base¬ 
line  single  variable  test  cases  identified  for 


N  and  M  respectively  are  the  following: 

Mbaseline  =  {B,  3+,  b,  C} 

and 

Mbaseline  —  {d,  d+,  e,  f-,  f,  f+,  g,  h-,  h} 

Under  the  single- fault  assumption, 
multi-variable  BVA  test  cases  are  selected 
that  exercise  the  boundaries  of  one  vari¬ 
able  while  the  other  variables  are  held  at  a 
nominal  value.  The  final  set  of  test  cases 
selected  is  the  union  of  all  test  cases  iden¬ 
tified  as  this  procedure  is  applied  to  each 
individual  input  in  turn.  In  the  following 
example,  I  have  chosen  to  apply  this  pro¬ 
cedure  to  each  subrange  of  each  variable 
in  turn  to  produce  a  symmetric  solution. 

Since  we  have  assumed  that  there  are 
two  inputs  in  this  problem,  the  set  of  test 
cases  will  consist  of  ordered  pairs  of 
inputs  (m,n)  such  that  n  is  2i  member  of 
Mbaseline  aud  is  a  member  of  Mbaseline.  Figure 
4A  shows  a  graph  of  the  nine  test  cases 
identified  assuming  that  n  is  held  to  its 
nominal  value  b  while  m  varies  across  the 
members  of  MbaseUne.  The  graph  in  Figure 
4B  shows  the  10  test  cases  identified  in 
which  m  is  held  to  its  nominal  value,  e  or  g, 
while  n  varies  across  the  members  of 
JA baseline.  Figure  4C  illustrates  the  union  of 
these  sets  of  test  cases.  Note  that  due  to 
the  selection  of  {e,b)  and  (g,b)  twice,  a  total 
of  17  test  cases  have  been  identified 
instead  of  19. 

For  robustness  testing,  one  applies  the 
same  procedure  starting  with  the  values 
previously  identified  in  the  sets  Mrobust  and 
Nrvbust.  Note  that  under  the  single-fault 
model,  robustness  testing  adds  only  six 
additional  test  cases  to  the  1 7  baseline  test 
cases  for  a  total  of  23  test  cases.  These 
additional  tests  are  identified  in  Figure  4C. 

Under  the  multiple- fault  assumption, 
additional  test  cases  must  be  selected  to 
detect  multiple,  simultaneous  faults  such 
as  erroneous  range  checking  on  two  vari- 
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ables  at  the  same  time.  The  multiple- fault 
BVA  procedure  again  starts  with  the  sets 
Mbaseiine  aud  Nbaselim  if  bouuds  checkiug  is  not 
critical  or  Mrobust  and  Nrobust  if  bounds  check¬ 
ing  is  a  high  priority.  To  select  BVA  test 
cases  assuming  that  multiple  simultaneous 
faults  are  likely,  one  computes  the 
Cartesian  product  J^basel/ne  X  Mbaseiine  fot  thc 
baseline  multiple-fault  test  cases  or  Mrobust  x 
Nrobust  for  the  multiple- fault,  i.e.,  worst-case 
test  cases  [3]. 

Given  two  sets  M  and  N,  the  Cartesian 
product  of  M  and  N  is  defined  as  follows: 

M  X  N  =  {(m,n)  |  m  g  M  a  n  €  N} 

where  (m,n)  denotes  an  ordered  pair  [5]. 
In  other  words,  M  x  N  is  the  set  that  con¬ 
sists  of  all  possible  ordered  pairings  of  an 
element  from  set  M  with  an  element  of  set 
N.  So,  if  set  M  contains  x  elements  and  set 
N  contains  y  elements,  the  resulting  set  M 
X  N  will  contain  a  total  x*y  total  elements. 

Figure  5  depicts  the  baseline  and 
robust  BVA  test  cases  identified  for  our 
sample  problem  assuming  the  multiple- 
fault  model.  Note  the  significant  increase 
in  the  total  number  of  tests  identified. 
Forty-five  baseline  test  cases  were  identi¬ 
fied  for  this  problem  plus  an  additional  32 
for  worst-case  robustness  testing. 

Table  1  summarizes  the  number  of  test 
cases  identified  versus  various  reliability 
requirements  and  fault  model  assump¬ 
tions.  The  multiple- fault  assumption  sig¬ 
nificantly  increases  the  total  number  of 
tests  required,  especially  in  situations 
where  a  variable  of  interest  has  multiple 
ranges.  Under  the  single-fault  assumption, 
the  incorporation  of  robustness  tests,  even 
in  the  situation  where  a  variable  has  multi¬ 
ple  ranges,  results  in  a  modest  increase  in 
the  total  number  of  test  cases  required. 

Discussion 

From  the  previous  review  it  is  clear  that 
BVA  has  several  advantages:  The  mechan¬ 
ical  nature  of  the  procedure  and  the  sym¬ 
metry  of  the  tests  identified  make  the 
BVA  procedure  easy  to  remember  and 
use,  especially  given  that  critical  input 
boundaries  are  often  already  explicitly 
identified  in  the  requirements.  With  BVA, 
one  can  adjust  the  number  of  test  cases 
identified  and,  thus,  the  resources  expend¬ 
ed  on  testing  effort,  depending  upon  the 
robustness  demands  of  the  product. 

BVA  also  serves  as  an  introduction  to 
other  test  techniques.  Discussions  of  BVA 
in  the  literature  are  often  intermingled 
with  a  related  black-box  technique  known 
as  Equivalence  Partitioning  (EP),  which 
utilizes  the  boundary  values  in  an  attempt 
to  define  partitions  or  sets  of  test  cases 


that  are  equivalent  in  the  sense  that  all  test 
cases  grouped  within  a  particular  partition 
would  reveal  the  presence  of  the  same  set 
of  defects  and  likewise  fail  to  detect  other 
defects.  In  its  simplest  form,  once  the  par¬ 
titions  are  identified,  the  set  of  test  cases 
selected  is  one  representative  test  case 
from  each  partition.  A  distinct  advantage 
of  the  EP  technique  is  that  the  total  num¬ 
ber  of  test  cases  is  significantly  smaller 
than  the  set  of  test  cases  identified 
through  BVA.  In  fact,  the  set  of  test  cases 
identified  by  EP  can  be  a  subset  of  those 
identified  by  BVA,  and  researchers  have 
exploited  this  fact  to  reduce  the  total  num¬ 
ber  of  test  cases  identified  in  merged 
BVA-EP  schemes. 

Studies  show,  however,  that  BVA  can 
be  effective  at  identifying  failures.  In  [6], 
Reid  investigated  the  effectiveness  of  ran¬ 
dom  testing,  equivalence  partitioning,  and 
boundary  value  analysis  techniques. 

**With  BVA,  one  can 
adjust  the  number  of 
test  cases  identified  and, 
thus,  the  resources 
expended  on  testing 
effort 


According  to  his  results,  the  probability 
that  BVA  would  detect  a  fault  was  more 
than  six  times  higher  than  random  testing 
and  more  than  twice  as  high  as  equiva¬ 
lence  partitioning.  The  cost  for  this 
increased  effectiveness  was  additional  test 
cases,  on  the  order  of  two  to  three  times 
the  number  of  test  cases  as  equivalence 
partitioning  depending  upon  the  particular 
variations  of  the  techniques  employed. 

Other  studies  have  compared  function¬ 
al,  structural,  and  code  reading  test 
methodologies.  In  structural  testing,  test 
cases  are  selected  to  exercise  specific  pro¬ 
gram  elements  such  as  statements,  branch¬ 
es,  or  paths  through  a  code  segment.  For 
example,  to  achieve  100  percent  statement 
coverage,  the  set  of  test  cases  identified 
must  force  execution  of  each  program 
statement  at  least  once.  For  100  percent 
branch  coverage,  the  set  of  test  cases  iden¬ 


Figure  5:  Multiple-Yault,  P>aseline,  and  Kobust 
Tests  (derived from  [3]j 


tified  must  force  each  branch  option  to 
execute  at  least  once.  For  code  reading, 
individuals  were  given  the  source  code  and 
asked  to  work  backwards  towards  a  specifi¬ 
cation  for  that  program  by  successively 
grouping  subprograms  into  logical  mod¬ 
ules  until  an  understanding  of  the  overall 
functionality  was  achieved.  Failures  were 
detected  by  comparing  the  actual  specifica¬ 
tion  to  that  derived  by  the  code  reader. 

BasiH  and  Selby  [7]  studied  the  relative 
effectiveness  of  a  combined  BVA  and  EP 
functional  testing  approach  against  100% 
statement  coverage  structural  testing  and 
code  reading.  Among  professional  pro¬ 
grammers,  they  found  that  code  reading 
detected  the  most  faults  followed  by  func¬ 
tional  testing  and  then  structural  testing. 
The  average  maximum  statement  coverage 
achieved  by  both  the  functional  and  struc¬ 
tural  testers  was  97  percent  yet  the  func¬ 
tional  testing  approach  detected  more 
faults  than  did  structural  testing  in  this 
study.  It  was  also  noted  that  the  number  of 
faults  detected  varied  with  the  type  of  soft¬ 
ware  tested,  and  that  the  testing  techniques 
tended  to  detect  different  types  of  faults. 

The  relative  effectiveness  of  a  com¬ 
bined  BVA  and  EP  functional  testing  tech¬ 
nique,  100%  branch  coverage  structural 
testing,  and  code  reading  were  compared  in 
[8].  This  study  also  observed  that  the  effec¬ 
tiveness  of  the  techniques  varied  with  both 
the  nature  of  the  programs  and  of  the 
faults  themselves.  Most  importantly,  this 
study  determined  that  the  use  of  two  or 
more  test  techniques  together,  such  as 
functional  testing  and  code  reading,  was 
more  effective  in  general  than  any  single 
methodology  since  the  techniques  were 


Table  1:  Number  of  Test  Cases  Identified for  Two-Variable  TAVA  Problem 


Number  of  Tests  Identified 

Assumed  Fault  Model 

Single-Fault 

Multiple-Fault 

Reliability 

Requirement 

Baseline 

17 

45 

Robust 

23 

77 
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essentially  complementary  [8]. 

Conclusions 

The  BVA  technique  provides  a  systematic 
procedure  for  evaluating  the  completeness 
and  quality  of  a  software  product.  While 
some  may  find  excessive  redundancy  in  the 
set  of  test  cases  generated  by  boundary 
value  analysis,  I  have  found  that  the  sym¬ 
metry  and  mechanical  nature  of  BVA  help 
to  make  the  procedure  both  easier  to  teach 
at  the  undergraduate  level  and  easy  to 
remember  and  apply  in  practice.  BVA  also 
provides  a  basis  for  learning  other  tech¬ 
niques,  in  particular,  equivalence  partition¬ 
ing,  and  it  is  effective  as  a  functional  testing 
technique  for  identifying  failures.  Empirical 
studies  show,  however,  that  a  combination 
of  functional,  structural,  and/ or  code  read¬ 
ing  techniques  is  generally  more  effective 
than  relying  upon  any  single  methodology 
since  the  effectiveness  of  the  techniques 
vary  with  both  the  type  of  code  being  test¬ 
ed  and  the  nature  of  the  faults. ♦ 
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VoIP  Softphones 
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Voice  over  Internet  Protocol  (V ilP)  provides  the  user  with  an  opportuniy  to  combine  the  use  of  a  telephone  with  a  personal  com¬ 
puter  (PC)  into  what  is  known  as  a  Softphone.  ^  Softphone  allows  users  to  place  and  receive  calls  using  a  PC.  This  article  cov¬ 
ers  what  a  Softphone  is  and  its  issues,  such  as  quality  of  service  and  security,  which  affect  Softphones.  The  Technical  Integration 
Center  fTIC)  currently  does  not  recommend  significant  use  of  Softphones  in  the  Army  due  to  security  and  certification  issues. 


In  the  past  10  years  technology  has 
advanced  to  the  point  whereby  telephone 
calls  can  be  placed  over  Internet  Protocol 
(IP)  packet  networks,  also  know  as  VoIP 
One  of  the  developments  in  this  transition  to 
VoIP  was  to  turn  a  computer  into  a  VoIP 
telephone  by  loading  and  running  a  VoIP 
software  application  on  the  computer.  This 
VoIP  application  has  emerged  to  be  called  a 
Softphone.  A  key  motivation  for  using  the 
Softphone  is  lower  cost.  This  is  due  to  the 
fact  that  the  Softphone  is  little  more  than 
software,  as  compared  to  a  traditional  tele¬ 
phone  that  is  mostly  or  all  hardware. 
Softphones  are  also  able  to  take  advantage  of 
making  calls  over  the  Internet  with  little  addi¬ 
tional  equipment.  This  can  save  on  long  dis¬ 
tance  charges,  especially  when  talking  to 
another  Softphone.  Other  advantages  of  the 
Softphone  include  potential  integration  with 
other  applications,  no  space  needed  on  the 
desk  for  a  telephone,  and  the  ability  to  move 
one’s  phone  number  with  a  computer. 

What  Is  a  Softphone? 

For  this  article,  a  Softphone  is  a  VoIP  client 
application  running  on  a  computer.  The 
Softphone  uses  VoIP  signaling  to  establish 
calls,  tear  down  calls,  and  take  advantage  of 
call  features  such  as  call  forwarding.  The 
Softphone  also  uses  VoIP  protocols  to  trans¬ 
port  audio  traffic  in  IP  packets  to  another 
VoIP  device.  The  Softphone  is  a  client 
device,  as  it  is  the  user  device  for  establishing 
and  tearing  down  calls.  Other  applications, 
such  as  a  call  processor  application  running 
on  a  computer,  would  not  be  considered  a 
Softphone.  The  Softphone  is  also  a  software 
application  loaded  onto  a  computer  and  not 
a  hardware  device  running  in  a  computer. 

Softphones  using  the  Microsoft  operat¬ 
ing  system  will  generally  use  the  Telephony 
Application  Programming  Interface  (TAPI), 
which  enables  PCs  to  support  telephone  ser¬ 
vices.  TAPI  provides  support  for  such  fea¬ 
tures  as  the  volume  control,  microphone 
level,  speakerphone,  call  control,  etc.  The 
version  in  Extra  Professional  also  provides 
support  for  telephones  connected  to  a  PC  via 
a  Universal  Serial  Bus  port. 

The  most  common  motivation  for  using 
a  Softphone  is  avoiding  long  distance  tele¬ 


phone  calls.  People  using  them  for  business 
can  connect  up  from  a  hotel  room  and  place 
calls  back  to  the  office  using  a  PC  and  avoid 
using  the  hotel  telephone  or  cell  phone  min¬ 
utes.  Home  users  are  able  to  call  and  talk  to 
each  other  using  PCs  (sometimes  with  video 
added)  and  avoid  toll  charges. 

For  the  Department  of  Defense  (DoD), 
Softphones  have  potential  applications  with 
tactical  users.  A  user  could  gain  telephone 
service  simply  by  connecting  a  PC  to  an  IP 
network  and  be  able  to  place  calls  without  a 
local  call  processor  set  up.  An  added  advan¬ 
tage  is  that  the  user’s  telephone  number 
would  move  with  the  PC,  making  the  user 
more  reachable. 

Operational  Aspects  of 
Softphones 

Operation  of  a  Softphone  is  significantly  dif¬ 
ferent  from  the  operation  of  a  traditional 
telephone.  In  order  to  place/receive  calls  at 
any  time,  the  user’s  computer  must  be  turned 
on  and  the  Softphone  application  running  aU 
the  time.  Power  must  also  be  provided  to  the 
computer  at  aU  times,  and  in  the  event  power 
is  lost,  the  computer  needs  to  be  re-booted. 
This  can  be  avoided  by  providing  power 
backup  to  the  computer  in  the  form  of  an 
uninterruptible  power  supply.  The  Softphone 
will  also  only  be  as  reliable  as  the  computer. 
If  the  computer  is  not  stable  and  has  to  be 
rebooted  periodically,  the  reliability  of  the 
Softphone  will  be  affected. 

Most  traditional  telephones  have  a  hand¬ 
set  the  user  utilizes  for  talking  and  listening. 
Most  Softphone  applications  either  use  the 
computer  speakers  and  a  microphone  or  use 
a  headset  that  includes  both  an  earpiece  and 
microphone.  Answering  a  call  with  a  tradi¬ 
tional  telephone  is  done  by  picking  up  the 
handset;  whereas  a  Softphone  is  answered  by 
clicking  on  an  answer  call  icon.  Dkewise,  end¬ 
ing  a  call  with  a  traditional  telephone  is  typi¬ 
cally  done  by  putting  the  handset  into  the  cra¬ 
dle;  whereas  a  Softphone  call  is  ended  by 
clicking  on  an  icon  to  end  the  call. 

Call  features  also  work  differently,  and 
this  is  one  of  the  areas  where  Softphones 
have  an  advantage  over  traditional  telephone 
sets.  With  a  traditional  telephone  call  features 
are  activated  by  selecting  different  combina¬ 


tions  of  digits.  For  example,  to  have  calls  for¬ 
warded  a  user  might  have  to  dial  the  digits 
#75  and  then  the  call  transfer  number.  With 
a  Softphone,  the  user  would  select  the  call 
transfer  icon  and  then  enter  the  call  transfer 
number.  This  eliminates  the  need  to  remem¬ 
ber  or  look  up  various  digit  combinations  to 
enable  call  features.  A  number  of  vendor 
implementations  of  Softphones  allow  the 
graphical  user  interface  (GUI)  on  the 
Softphone  to  be  used  with  a  traditional  tele¬ 
phone.  Each  user  has  a  computer  with  the 
GUI  loaded  and  a  separate  telephone.  The 
telephone  is  used  as  a  traditional  telephone, 
but  when  the  user  wants  to  utilize  a  call  fea¬ 
ture,  such  as  forwarding  a  call,  it  is  done  on 
the  GUI  interface. 

Softphones  also  have  the  advantage  of 
integrating  well  with  other  applications.  For 
example,  the  Microsoft  Netmeeting  applica¬ 
tion  can  place  calls,  but  it  can  also  share  out 
an  application  between  users.  This  would 
enable  two  users  to  hold  a  conversation  and 
share  a  Word  document  they  would  both  be 
able  to  see  and  change.  Other  applications 
that  can  be  integrated  with  the  Softphone  are 
Video  Teleconferencing  and  whiteboards, 
which  allow  both  sides  to  write  on  a  virtual 
chalkboard  and  each  can  see  what  the  other  is 
drawing.  A  new  feature  forthcoming  to  the 
Web  is  a  Softphone  built  into  a  Web  site.  A 
user  could  read  a  Web  page,  have  a  question, 
and  click  on  a  link  that  would  provide  audio 
communication  with  someone  at  customer 
service.  This  ability  to  integrate  with  applica¬ 
tions  also  makes  Softphones  ideal  for  call 
centers.  A  worker  in  a  call  center  could  have 
a  conversation  with  a  customer  while  other 
applications  integrated  with  the  Softphone 
could  bring  up  information  on  the  customer. 

Softphones  have  the  ability  to  call  other 
Softphones  on  the  Internet  or  place  calls  to 
the  Public  Switched  Telephone  Network 
(PSTN).  Softphones  can  contact  each  other 
directly  over  the  Internet  a  couple  of  ways. 
One  way  is  to  have  the  calling  party  dial  the 
IP  address  of  the  called  party  and  establish  a 
connection.  Another  way  is  to  register  with  a 
service.  The  service  provides  either  a  tele¬ 
phone  number  or  name  that  is  put  into  a  reg¬ 
istration  server  along  with  the  user’s  IP 
address  when  the  user  registers.  The  calling 
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party  receives  the  called  party’s  IP  address 
using  the  registration  service  and  establishes 
the  call.  It  should  be  pointed  out  that 
Network  Address  Translation  (NAT)  can 
cause  problems  for  Softphones  connecting 
directly,  and  this  will  be  covered  later  in  more 
detail. 

Softphones  can  also  be  set  up  to  make 
calls  to  the  PSTN.  This  is  done  as  part  of  a 
VoIP  solution  that  includes  a  VoIP  gateway 
with  connectivity  to  the  PSTN.  A  popular 
option  in  the  commercial  world  is  to  pay  for 
a  service  that  includes  a  gateway  to  the 
PSTN.  When  the  Softphone  connects  to  the 
PSTN,  it  wiU  need  to  have  either  a  real  tele¬ 
phone  number  or  an  extension  number.  The 
service  provides  the  means  of  registering  the 
telephone  number  with  the  user’s  IP  address. 

When  a  Softphones  is  loaded  onto  a  lap¬ 
top  computer  it  has  the  added  advantage  of 
being  mobile.  It  still  has  the  ability  to  connect 
peer-to-peer  or  to  its  PSTN  service  provider 
when  its  location  has  changed.  One  interest¬ 
ing  feature  of  a  mobile  Softphone  is  that  its 
telephone  number  moves  with  it.  For  exam¬ 
ple,  if  a  user  is  connected  with  a  Softphone 
to  the  Internet  in  Dallas  and  has  a  Dallas 
telephone  number,  and  that  user  discon¬ 
nects,  goes  to  Denver  and  connects  to  the 
Internet  there,  then  the  user’s  telephone 
number  will  appear  to  be  the  number  from 
Dallas.  If  someone  calls  the  user’s  Dallas 
telephone  number,  the  Softphone  in  Denver 
win  ring.  This  adds  an  element  of  conve¬ 
nience  to  the  Softphone,  but  also  has  an 
effect  on  911  service. 

911  service  is  designed  to  map  the  user’s 
telephone  number  to  a  location.  When  a  user 
dials  911,  the  operator  is  able  to  query  a  data¬ 
base  and  determine  location  from  the  user’s 
phone  number.  When  a  Softphone  has  a  tele¬ 
phone  number  assigned  and  stays  in  one 
location,  there  is  no  issue  with  mapping  this 
number  in  the  database  to  the  location. 
When  the  Softphone  has  a  phone  number 
and  changes  location,  this  can  pose  a  prob¬ 
lem.  If  the  user  in  the  previous  example  were 


to  dial  911  while  in  Denver,  the  call  would  be 
answered  by  an  operator  in  Dallas,  who 
would  assume  that  the  user  was  in  Dallas. 
This  could  have  a  serious  impact  on  emer¬ 
gency  services.  A  law  was  passed  recently  that 
requires  commercial  providers  of  VoIP  ser¬ 
vice  (including  Softphones)  to  offer  users 
with  a  means  of  providing  their  location 
information.  This  only  applies  to  the  PSTN 
connection  services  and  not  to  the  peer-to- 
peer  services.  The  U.S.  Army  system  has  not 
provided  such  a  number-to-location 
Softphone  service,  and  it  is  recommended 
that  911  calls  be  placed  using  Softphones 
only  as  a  last  resort. 

Technical  Aspects  of  a 
Softphone 

This  section  wiU  discuss  how  a  Softphone 
works  and  the  protocols  it  uses  for  commu¬ 
nication.  Figure  1  shows  a  typical  VoIP  con¬ 
figuration  that  includes  a  Softphone  and  a 
PSTN  gateway.  For  the  peer-to-peer  case,  the 
configuration  consists  only  of  two  or  more 
Softphones  connected  to  an  IP  network. 

The  Softphone  uses  the  registration  serv¬ 
er  to  register  its  user  name  (typicaUy  a  tele¬ 
phone  number  or  Universal  Resource 
Identifier  to  IP  address  mapping).  Registering 
wiU  require  some  form  of  authentication, 
such  as  a  personal  identification  number  or 
Common  Access  Card.  The  IP  connection 
between  the  Softphone  and  the  registration 
server  should  be  encrypted  to  protect 
authentication  information. 

The  caU  processor  is  used  for  establishing 
caUs,  tearing  down  caUs,  routing  caUs,  and 
supporting  call  features.  The  Softphone 
sends  and  receives  caU  signaling  messages 
from  the  caU  processor.  The  Softphone  uses 
the  caU  signaling  messages  to  establish  caUs 
to  the  other  devices,  including  gateways,  IP 
telephones,  and  other  Softphones. 

CaU  signaling  messages  currently  used 
today  include  H.323  and  Session  Initiation 
Protocol  (SIP).  The  H.323  and  SIP  protocols 


Figure  1 :  Softphone  in  a  Typical  VoIP  Configuration 
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are  designed  to  have  the  intelligence  of  the 
VoIP  system  pushed  to  the  edge,  enabling 
them  to  caU  each  other  without  requiring  a 
caU  processor.  The  H.323  protocol  was 
developed  by  the  International 
Telecommunications  Union  and  is  the  oldest 
of  the  protocols  and  currently  the  most 
heavUy  implemented  [1].  The  SIP  protocol 
was  developed  by  the  Internet  Engineering 
Task  Force  (IETF)  and  is  considered  lighter 
from  a  code  implementation  and  processor 
perspective  [2].  The  SIP  protocol  is  becom¬ 
ing  more  popular  and  is  expected  by  many  to 
replace  H.323  in  the  future. 

The  actual  audio  traffic  flows  between 
the  Softphone  and  other  devices  in  packets 
that  use  the  Real-time  Transfer  Protocol 
(RTP).  These  packets  contain  the  audio,  as 
weU  as  timing  information,  sequence  num¬ 
bering,  an  identifier  of  the  information 
(compressed  voice,  video,  etc.),  and  other 
information.  There  is  also  a  secure  version  of 
RTP  avaUable,  caUed  Secure  RTP,  which  pro¬ 
vides  encryption  and  authentication  of  the 
voice  traffic.  The  Real  Time  Control  Protocol 
supports  RTP  by  conveying  information 
about  the  quality  of  the  communication, 
such  as  jitter  and  packet  loss. 

Quality  of  Service  (QoS)  is  one  of  the 
major  issues  for  Softphones.  When  data  traf¬ 
fic  experiences  packet  loss  or  significant 
delay,  the  packets  are  present  and  the  user 
observes  that  the  data  is  taking  longer  to  send 
or  receive.  For  VoIP  traffic,  significant  pack¬ 
et  loss,  delay,  or  jitter  (variations  in  delay)  is 
noticeable  to  the  user.  Resending  lost  packets 
is  not  an  option,  as  the  conversation  will  have 
moved  on  by  the  time  they  are  retransmitted. 
QoS  solves  these  problems  by  enabling  voice 
packets  to  get  queuing  priority  over  data 
packets  in  the  IP  network. 

The  Softphone  sets  QoS  and  tells  the  IP 
network  it  needs  priority  in  a  couple  of 
ways.  The  first  way  is  by  setting  the  Diffserv 
bits  in  the  IP  header  to  a  higher  priority. 
Layer  3  Ethernet  switches  will  look  at  these 
bits  and  put  these  packets  into  a  higher  pri¬ 
ority  queue.  Another  way  is  to  set  the 
Institute  for  Electrical  and  Electronics 
Engineers  (IEEE)  802. IP  priority  bits, 
which  are  sent  in  the  IEEE  802. IQ  virtual 
local  area  network  (VLAN)  tag.  Layer  2 
Ethernet  switches  look  at  these  bits  and  use 
them  to  prioritize  the  packets.  The  VLAN 
tag  also  has  a  significant  role  in  logically 
separating  the  voice  and  data  traffic,  with 
voice  traffic  receiving  one  tag  value  and 
data  traffic  another.  Both  of  these  methods 
of  providing  QoS  work  fine  in  DoD  local 
area  networks  (LANs),  but  they  are  current¬ 
ly  not  supported  in  the  Non-secure  Internet 
Protocol  Router  Network  or  in  the  com¬ 
mercial  Internet.  This  means  Softphones 
used  in  a  remote  fashion  will  not  have  any 
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QoS  and  its  traffic  will  receive  no  priority. 

A  problem  with  QoS  and  Softphone  is 
the  need  for  QoS  within  the  computer. 
Computers  are  generally  not  set  up  to  pro¬ 
vide  priority  on  the  internal  buses  and  inter¬ 
faces  to  certain  applications.  There  is  a  means 
to  provide  priority  to  an  application  within 
Windows,  but  this  tends  to  make  the  system 
unstable.  As  a  result  of  the  lack  of  QoS,  the 
latency  in  the  Softphone  can  be  on  the  order 
of  hundreds  of  milliseconds,  which  are  at  a 
level  where  the  human  ear  can  begin  to  detect 
it  and  is  outside  the  60  milliseconds  DoD 
end-to-end  VoIP  limit. 

Another  technical  issue  for  Softphones  is 
circumventing  the  NAT  point.  When  all  of 
the  VoIP  devices  are  connected  in  the  same 
LAN  this  is  not  an  issue.  However,  when  the 
calling  party  is  on  one  side  of  a  NAT  point 
and  the  called  party  is  on  the  other,  there  is  a 
problem.  The  signaling  message  the  calling 
party  sends  to  the  called  party  contains  the  IP 
address  of  the  calling  party.  When  packets 
pass  through  the  NAT  point  this  IP  address 
is  changed.  When  the  called  party  attempts  to 
send  packets  to  the  IP  address  in  the  signal¬ 
ing  message,  they  are  dropped  (especially  if 
private  addressing  was  used).  One  current 
solution  is  to  use  the  Simple  Transversal  of 
User  Datagram  Protocol  through  NAT  pro¬ 
tocol.  This  protocol  works  by  having  the 
Softphone  communicate  with  a  server  out¬ 
side  the  NAT  point.  The  server  is  able  to  see 
its  real  IP  address  and  port  number  and  com¬ 
municate  this  back  to  the  Softphone.  The 
Softphone  then  uses  this  IP  address  and  port 
number  in  its  signaling  messages. 

The  VoIP  devices,  including  Softphones, 
have  a  few  tricks  for  reducing  the  amount  of 
bandwidth  that  they  utilize.  One  of  them  is 
to  use  voice  compression  algorithms. 
Uncompressed  voice  (G.711)  uses  64 
Kilobits  per  second  (Kbps)  plus  IP  network 
overhead.  Other  algorithms,  such  as  G.729 
(which  uses  eight  Kbps),  use  less  bandwidth. 
The  drawback  is  that  voice  quality  may  be 
affected.  Current  DoD  policy  only  allows 
G.711,  but  this  is  expected  to  change  in  the 
future,  especially  when  VoIP  goes  to  tactical 
units.  Another  trick  is  to  use  Voice  Activity 
Detection  (VAD).  When  a  Softphone  uses 
VAD  it  only  sends  voice  packets  when  the 
user  is  talking.  No  packets  are  sent  that  con¬ 
tain  silence.  In  a  typical  conversation,  only 
one  person  is  talking  at  a  time  so  there  is 
audio  in  one  direction  and  silence  in  the 
other.  When  the  silence  packets  are  removed, 
the  amount  of  bandwidth  utilized  can  be 
reduced  by  50  percent  or  more.  One  feature 
to  look  for  in  a  Softphone  that  uses  VAD  is 
background  noise  insertion.  Without  this,  the 
telephone  connection  sound  is  so  quiet  dur¬ 
ing  periods  of  silence  removal  it  appears  the 
connection  is  dead. 


An  issue  for  VoIP  and  Softphones  in  the 
future  will  be  Internet  Protocol  Version  6 
(IPV6).  Currently,  all  DoD  IP  networks  are 
expected  to  be  capable  of  transitioning  to 
IPV6  by  2008.  The  computer,  the  Softphone 
application,  and  the  operating  system  will 
need  to  support  IPV6  for  the  Softphones  to 
use  IPV6.  For  the  Softphone  to  work  with 
the  other  VoIP  devices,  the  call  processor/ 
registration  server,  gateways,  IP  telephones, 
etc.  within  its  enclave  will  all  need  to  be  run¬ 
ning  IPV6.  The  IPV6  protocol  may  also  have 
an  impact  on  the  NAT  problem.  Due  to  the 
large  address  space  of  IPV6,  it  is  anticipated 
that  IPV6  will  make  NAT  unnecessary. 

Security  Issues  With  Softphones 

Security  is  currently  the  most  difficult  issue  to 
overcome  with  Softphones.  The  current 
Defense  Information  Systems  Agency 
Security  Technical  Implementation  Guide  (STIG) 
states,  “The  use  of  Softphones  is  highly  dis¬ 
couraged.”  This  is  due  to  a  number  of  items 
related  to  the  nature  of  Softphones  [3].  This 
section  will  go  into  these,  along  with  the 
STIG  requirements,  in  more  detail. 

For  VoIP  implementations,  the  security 
requirements  require  that  the  voice  and  data 
traffic  be  separated  into  networks,  either 
physically  or  logically.  Separate  physical  net¬ 
works  require  separate  networking  devices, 
such  as  switches  and  routers,  for  both  data 
and  voice  networks.  Logical  separation 
means  that  the  traffic  is  separated  into  logi¬ 
cal  networks,  typically  using  VLANs.  Data 
devices  are  connected  to  data  network 
devices  or  ports  in  the  data  network  VLAN 
and  likewise  for  the  voice  devices.  The  major 
issue  with  Softphones  is  they  tend  to  reside 
on  computers  having  applications  requiring 
access  to  both  data  and  voice  networks.  For 
example,  the  Softphone  computer  would 
have  the  Softphone  application,  and  then  it 
might  have  other  applications,  such  as  e- 
mail,  Web  browsing,  etc.,  that  require  access 
to  the  data  network.  The  following  require¬ 
ment  in  the  STIG  addresses  this  issue: 

(VoIPOlSO:  CAT  I)  The  Information 
Assurance  Officer  (I AO)  require¬ 
ment  will  ensure  that  if/when 
approved  Softphones  are  used  in  the 
LAN,  the  following  conditions  are 
met: 

•  The  host  computer  contains  a 
Network  Interface  Card  (NIC), 
(commonly  called  a  network 
adapter)  that  is  802. IQ  (VLAN 
tagging)  and  802. IP  (priority 
tagging)  capable. 

•  The  host  computer,  NIC,  and 
IP  Softphone  agent  software  is 
configured  to  use  separate 
802. IQ  VLAN  tags  for  voice 


and  data. 

•  Alternatively,  dual  NICs  may  be 
used  where  voice  traffic  is  rout¬ 
ed  to  one  NIC  and  data  traffic 
is  routed  to  the  other.  Each 
NIC  is  connected  to  an  access 
switch  port  residing  in  the 
appropriate  VLAN. 

•  The  host  computer  wiU  be  con¬ 
nected  to  separate  voice  and 
data  VLANs  that  have  been 
created  expressly  for  the 
Softphone  host(s).  That  is  to 
say  that  the  LAN  should  have  a 
voice  VLAN  and  a  data  VLAN 
dedicated  to  hosts  with  IP 
Softphone  agents  installed.  [3] 

A  couple  of  issues  occur  with  imple¬ 
menting  these  requirements.  The  first  is  that 
most  computer  NIC  cards  are  not  able  to 
support  VLAN  tagging.  This  would  make 
two  NICs  in  the  computer  necessary.  The 
second  is  that  some  means  need  to  be  in 
place  to  ensure  that  the  voice  traffic  only 
goes  to  the  voice  VLAN  and  the  data  traffic 
only  goes  to  the  data  VLAN.  The  major 
security  concern  here  is  a  hacker  coming  into 
a  computer  on  the  data  network  and  routing 
over  to  the  voice  network. 

The  STIG  also  addresses  the  case  where 
the  Softphone  is  used  in  a  computer  that  is 
accessing  the  network  remotely.  The  STIG 
states  the  following: 

(V0IPOI6O:  CAT  I)  The  lAO  will 
ensure  that  if/when  approved 
Softphones  are  used  in  remote  con¬ 
nectivity  situations,  the  following 
conditions  are  met: 

•  The  host  computer  connects  to 
the  “home  LAN”  through  a 
Virtual  Private  Network  (WN) 
connection. 

•  The  VPN  is  terminated  at  the 
enclave  boundary  in  accordance 
with  the  Enclave  STIG. 

•  The  voice  and  data  traffic  is  rout¬ 
ed  appropriately  to  separate  voice 
and  data  VLANs  in  the  “home 
LAN.” 

•  The  IP  Softphone  agent  connects 
to  the  Call  Manager  (call  proces¬ 
sor)  on  the  “home  LAN” 
through  the  VPN  using  “home 
LAN”  IP  addressing.  [3] 

Implementing  this  has  the  same  issues  as 
connecting  locally,  namely  keeping  the  voice 
and  data  traffic  separate.  This  is  harder  to  do 
remotely,  as  the  remote  computer  would 
need  to  tag  the  traffic  appropriately  and  put  it 
into  a  VPN.  There  would  also  be  QoS  and 
Joint  Interoperability  Test  Center  (JITC)  cer- 
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tification  issues  with  using  Softphones 
remotely  (this  is  discussed  in  the  next  sec¬ 
tion). 

The  STIG  also  provides  the  following 
guidance  when  Softphones  are  used  in  a  call 
center: 

(VoIP0165:  CAT  I)  The  lAO  will 
ensure  that,  if/ when  approved 
Softphones  are  used  in  a  call  center 
situation;  the  caU  center  network  is 
configured  as  a  separate  enclave  and 
secured  in  accordance  with  all  applic¬ 
able  STIGs. 

This  means  that  the  caU  center  VoIP  traffic 
must  be  separated,  either  physically  or  logi¬ 
cally  from  the  rest  of  the  IP  traffic,  in  addi¬ 
tion  to  complying  with  all  of  the  other 
STIGs. 

Due  to  the  security  issues  with 
Softphones,  the  STIG  also  provides  the  fol¬ 
lowing  guidance  to  Designated  Approving 
Authorities  (DAAs): 

(VoIPOBO:  CAT  I)  The  lAO  will 
ensure  that  written  DAA  approval  is 
obtained  prior  to  the  use  of  any  IP 
Softphone  agent  software.  The  lAO 
win  maintain  documentation  pertain¬ 
ing  to  such  approval  for  inspection  by 
auditors. 

(VoIP0135:  CAT  I)  The  lAO  will 
ensure  a  local  IP  Softphone  policy 
exists  and  is  being  enforced  that 
addresses  the  following: 

•  Prohibits  the  installation  and  use 
of  IP  Softphone  agent  software 
on  workstations  (fixed  or  porta¬ 
ble)  intended  for  day-to-day  use 
in  the  user’s  normal  workspace. 

•  Prohibits  the  use  of  IP 
Softphone  agent  software  in  the 
user’s  normal  workspace,  which 
has  been  approved  and  installed 
on  a  portable  workstation  for  the 
purpose  of  VoIP  communica¬ 
tions  while  traveling. 

•  Prohibits  the  installation  and  use 
of  IP  Softphone  agent  software 
clients  that  are  independently 
configured  by  end  users  for  per¬ 
sonal  use  or  that  is  provided  by 
commercial  Internet  Telephony 
Provider  service  providers. 

•  Requires  prior  justification  and 
DAA  approval  for  the  use  of  any 
IP  Softphone  agent  software. 

•  Requires  that  the  justification  and 
DAA  approval  of  IP  Softphone 
agent  software  use  is  reviewed 
annually  and  approval  renewed  if 
justified. 


JITC  Certification  Issues 

Pubkc  law  and  DoD  policy  requires  that  all 
voice  solutions  attached  to  the  Defense 
Switched  Network  or  PSTN  obtain  interop¬ 
erability  and  become  Information  Assurance 
certified.  For  VoIP,  the  voice  solution 
includes  the  call  processors,  registration 
servers,  IP  telephones,  gateways,  and 
Softphones.  While  a  number  of  VoIP  solu¬ 
tions  currently  are  certified,  none  of  them 
include  a  Softphone.  This  is  partly  due  to  dif¬ 
ficulty  in  meeting  the  security  and  QoS 
requirements  and  partly  due  to  the  question 
of  configuration  change.  The  DISA/JITC 
policy  requires  a  VoIP  solution  to  be  recerti¬ 
fied  if  its  configuration  changes  from  what 
was  certified.  How  this  would  affect 
Softphones  is  not  yet  known.  For  example,  if 
a  computer  with  a  certified  Softphone 
were  to  change  its  audio  card  to  a  different 
brand,  would  it  need  to  be  recertified? 
There  is  currently  no  experience  with  this 
issue. 

There  is  currently  a  disconnect  in 
DoD  policy  regarding  the  use  of 
Softphones  from  a  remote  location,  such 
as  a  hotel  room.  The  STIG  allows  it  under 
certain  circumstances;  whereas,  the  DISA 
General  Switching  Center  Requirement 
(GSCR)  (which  contains  the  requirements 
for  interoperability  certification)  requires 
end-to-end  QoS  and  a  certification  of  the 
entire  network  the  VoIP  traffic  will  be  tra¬ 
versing  [4]. 

One  of  the  features  that  a  Softphone 
would  need  to  support  to  obtain  JITC  cer¬ 
tification  for  command  and  control  (C2) 
users  is  MultiLevel  Precedence  and 
Preemption  (MLPP).  The  MLPP  allows  a 
caller  with  a  higher  precedence  to  preempt 
a  call  of  lower  precedence.  This  is  typical¬ 
ly  used  when  high  priority  calls  need  to  get 
through  and  lines  are  tied  up. 

Currently,  all  JITC  certified  solutions 
consist  of  a  LAN  for  the  IP  network.  The 
use  of  VoIP  across  the  wide  area  network 
and  between  services  has  not  been  worked 
out.  Currently,  if  there  were  a  certified 
configuration  that  included  a  Softphone, 
the  Softphone  would  need  to  go  to  a 
PSTN  gateway  in  order  to  place  a  call  off 
of  an  installation. 

Conclusion 

While  IP  Softphones  offer  several  advan¬ 
tages,  including  mobility  and  a  GUI  for  call 
features,  it  may  be  a  number  of  years  before 
they  are  common  in  DoD  telephone  systems, 
with  the  possible  exception  of  call  centers. 
This  is  due  to  a  number  of  reasons. 
Softphones  are  still  awkward  to  use  due  to 
the  lack  of  a  handset.  Security  and  QoS 
issues  will  make  them  difficult  to  implement 


and  secure.  The  lack  of  location  awareness 
when  used  as  a  mobile  device  makes  them 
risky  for  911  use.  Until  JITC  certifies  a  VoIP 
solution  that  includes  a  Softphone,  it  will  be 
a  violation  of  DoD  policy  to  use  one. 

Recommendations 

The  US.  Army  Information  Systems 
Engineering  Command  (USAISEC)  Tech¬ 
nology  Integration  Center  (TIC)  recom¬ 
mends  a  continuing  effort  to  examine 
Softphones,  especially  in  applications  such  as 
call  centers.  Due  to  the  technical  complexities 
of  complying  with  security  and  performance 
requirements,  we  do  not  recommend  any  sig¬ 
nificant  move  to  replace  traditional  tele¬ 
phones  or  IP  telephones  with  Softphones  at 
this  time.^ 
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Software  project  estimation  is  not  what  we  think  it  is  because,  to  some  extent,  software  is  not  what  we  think  it  is.  This  arti¬ 
cle  explores  an  alternative  view  of  both  software  and  project  estimation  and  concludes  that  the  process  of  estimation  could  be 
much  more  valuable  than  we  usually  make  it. 


Predicting  the  future  can  be  a  rewarding 
occupation,  but  it  can  also  be  a  danger¬ 
ous  one.  Historically,  oracles  were  often 
praised  and  lauded.  But  they  were  also 
stoned  to  death  if  they  were  wrong  —  and 
sometimes  if  they  were  right  [1]. 

Software  project  estimation  is  a  difficult 
task  for  a  simple  reason:  Software  is  not 
really  a  product,  it  is  a  packaging  of  knowl¬ 
edge  and  we  cannot  measure  knowledge. 
Software  is  best  thought  of  as  a  knowledge 
storage  medium  rather  than  a  manufactured 
product  [2].  It  is  one  of  the  five  places  we 
can  put  knowledge  once  we  have  obtained 
it,  the  other  four  being  (in  historical  order): 
DNA  (Deoxyribonucleic  Acid),  brains, 
hardware,  and  books.  However,  knowledge 
in  software  has  different  characteristics  than 
knowledge  stored  in  the  other  media  [3]. 

Shooting  Tanks 

During  World  War  II,  the  knowledge  of 
how  to  hit  a  tank  with  a  bazooka  was  stored 
in  several  places:  It  was  stored  in  military 
manuals  (the  book  form)  and  in  the  ranging 
and  sighting  device  (the  hardware  form),  but 
it  was  stored  mostly  in  the  operator’s  head 
(the  brain  form)  (see  Figure  1).  There  are 
some  drawbacks  with  these  media:  The 
manual  only  describes  the  knowledge  -  it  does 
not  actually  do  anything;  the  sighting  mech¬ 
anism  allows  for  storage  and  use  of  only  a 
few  of  the  variables  —  mosdy  the  distance- 
to-target  versus  elevation  relationship;  and 
the  brain-resident  knowledge  has  the  dis¬ 
tinct  disadvantage  that  the  soldier  could  be 
shot  at  while  attempting  to  hit  the  target.  In 
modern  weapons  systems,  we  have  moved 
almost  all  of  this  knowledge  into  the  missile 
in  an  active  software-resident  form. 

Not  Product  Producing 

If  software  is  not  a  product,  but  is  a  medi¬ 
um,  then  software  development  is  not  a 
product-producing  activity.  In  fact,  it  is  best 
thought  of  as  a  knowledge  acquisition  activity. 
Most  of  the  effort  on  a  software  project  is 
related  to  acquiring  and  validating  knowl¬ 
edge  rather  than  creating  a  product.  We 
know  what  we  are  doing. 


In  an  attempt  to  estimate  projects,  we 
are  trying  to  figure  out  how  much  knowl¬ 
edge  we  do  not  have  and  how  much  time 
and  effort  it  will  take  to  get  it,  plus  a  small 
amount  of  time  and  effort  to  translate  it 
into  the  executable  form  once  we  have 
obtained  it.  There  are  two  challenges  to  this: 
First,  we  are  trying  to  measure  something 
we  do  not  have  which  is  always  hard  to  do, 
and  second  —  and  very  importantly  —  we  are 
trying  to  measure  knowledge,  and  knowl¬ 
edge  is  simply  not  a  measurable  thing. 

This  leads  us  to  some  observations 
about  the  essential  nature  of  project  estima¬ 
tion: 

•  We  cannot  have  an  accurate  estimate. 

Apart  from  it  being  an  oxymoron,  there 
is  a  simple  reason  why  estimates  cannot 
be  accurate  —  we  simply  do  not  have  the 
data  or  knowledge  we  need  to  be  accu¬ 
rate.  The  primary  activity  of  a  software 
project  is  to  get  this  knowledge.  The 
only  point  in  time  where  we  can  reason¬ 
ably  assert  we  are  accurate  is  at  the  end 
of  the  project  when  we  have  acquired  all 
the  knowledge  and  resolved  all  the 
uncertainty. 

It  is  possible  to  have  a  lucly  esti¬ 
mate.  This  happens  when  all  the  things 
we  did  not  or  could  not  think  of  that 
slowed  the  project  down  and  all  the 
other  things  we  did  not  think  of  that 
speeded  the  project  up  happen  to  be 
equal.  Since  there  are  more  things  that 
will  slow  a  project  down  than  speed  it  up 
—  an  application  of  the  2nd  Law  of 
Thermodynamics  to  projects  —  we  usu¬ 
ally  underestimate. 

•  The  purpose  of  estimating  is  not  to 
come  up  with  an  end-date.  This  is 
usually  what  we  are  asked  for  when 
someone  wants  an  estimate,  but  it  is  not 
a  weU-formed  request.  For  most  projects 
there  is  a  wide  range  of  possible  dates 
when  the  project  might  finish  (see 
Figure  2,  page  28).  At  the  point  in  time 
when  we  produce  the  estimate,  we  can 
posit  a  trade-off  of  probability  of  success 
for  schedule  and  other  resources.  It  is 
easy  to  be  100  percent  successful  in  pro¬ 


jects  -  simply  take  a  very  long  time  and 
use  a  very  large  number  of  very  good 
resources.  In  reality,  the  purpose  of  esti¬ 
mation  is  not  to  deduce  an  end-date,  it  is 
to  derive  the  probability  function  that 
describes  the  range  of  viable  end-dates. 
The  project  completion  date  and  sched¬ 
ule  is  not  determined  by  estimation  but 
by  the  commitment  process. 

•  Estimation  is  not  commitment. 
Making  an  estimate  is  not  the  same  thing 
as  making  a  commitment.  The  job  of 
estimation  is  to  identify  the  project’s 
probability  function.  The  job  of  the 
commitment  process  is  to  select  the 
point  along  the  probability  function  that 
best  manages  the  risk/return  ratio. 
Estimation  is  a  technical  activity;  com¬ 
mitment  is  a  business  activity,  and  they 
operate  on  quite  different  data. 

•  The  project  estimate  may  not  be 
dependent  on  the  delivered  system 
size.  Despite  the  fact  the  every  estima¬ 
tion  process  used  in  software  develop¬ 
ment  operates  on  the  expected  delivered 
system  size,  the  relationship  can  be  quite 
tenuous.  The  final  system  size  may  be  an 
indicator  of  the  effort  necessary  to  devel¬ 
op  the  system.  All  else  being  equal,  if 
one  system  expects  to  have  twice  as 
many  (say)  lines  of  code  as  another,  that 


Figure  1 :  Bazooka  Sighting  Mechanism.  Photo 
property  of  <www.antiquefirearm.com>  and 
<www.andrax.com>.  Used  with  permission 
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Cumulative  Probability  Distribution 

Figure  2:  Prohabilitj  of  Completion 


system  will  require  proportionally  more 
time  and  effort.  The  trouble  is,  all  else  is 
rarely  equal.  Viewed  as  a  knowledge 
acquisition  activity,  it  is  clear  why  pro¬ 
ject  effort  may  not  have  much  to  do 
with  the  final  size.  If  we  use  experi¬ 
enced  developers,  they  do  not  have  to 
acquire  as  much  knowledge  to  produce 
a  system  of  a  given  size  as  less  experi¬ 
enced  developers,  but  the  system  size 
does  not  change.  If  we  can  reuse  code, 
either  from  a  library,  or  embedded  with¬ 
in  a  language,  the  effort  is  less,  since 
some  of  the  knowledge  is  already  stored 
in  an  accessible  software  medium. 

Also,  a  system  may  be  small  in 
terms  of  its  Hne-of-code  form,  but  it 
may  have  very  high  knowledge  density  as  is 
true  of  real-time  embedded  systems  — 
the  amount  of  knowledge  needed  to 
make  them  work  divided  by  the  final 
executable  size  is  much  higher  than  for 
typical  business  systems.  However, 


while  the  knowledge  density  of  an 
information  technology  system  might 
be  kght,  it  may  constantly  change  as  the 
market  changes,  meaning  the  knowl¬ 
edge  must  be  reacquired.  So,  the  effort, 
schedule,  staff  and  cost  may  increase  or 
decrease  without  respect  to  system  size 
at  all. 

Almost  all  estimation  processes 
and  tools  provide  a  way  of  tuning  the 
final-size-driven  estimate  by  adjusting 
parameters  which  represent  the  systems’ 
attributes.  We  also  trust  these  adjust¬ 
ments  will  track  to  the  effort  necessary 
to  acquire  the  knowledge.  The  net  result 
of  this  tuning  may  entirely  submerge 
the  effect  of  the  system  size. 

The  effort-time  relationship  is  not 
Hnear.  In  fact,  it  is  a  high  order  recipro¬ 
cal  exponent  [4].  It  is  common  for  orga¬ 
nizations  to  beHeve  that  the  process  of 
building  software  is,  well,  a  building 
process  rather  than  a  knowledge  acquir¬ 


ing  process.  Accordingly,  they  operate 
on  a  set  of  assumptions  based  upon 
manufacturing.  This  includes  the  rela¬ 
tively  Hnear  relationship  of  effort  (peo¬ 
ple,  machines)  to  time  to  deHver.  In  a 
factory,  if  we  double  the  number  of 
machines  or  run  the  machines  twice  as 
long  or  twice  as  fast,  we  wiH  approxi¬ 
mately  double  the  output.  This  math 
simply  does  not  apply  to  software 
because  it  is  not  a  manufacturing 
process.  It  also  partly  explains  why 
adding  people  to  a  project  —  particularly 
when  it  is  akeady  running  late  and  the 
schedule  is  akeady  compressed  -  is  not 
an  effective  tactic  (see  Figure  3). 

The  Job  of  Estimation 

The  real  job  of  estimation  is  not  what  it 
seems.  True,  it  does  have  a  very  important 
role  in  determining  the  basic  planning  para¬ 
meters  of  staff  size,  project  duration,  effort 
and  cost  (and  for  some  estimation  models, 
quaHty  and  defects).  This  is  the  classical  tell 
me  when  the  project  will  he  done  role  of  estima¬ 
tion. 

The  caHbrations  necessary  to  achieve  a 
useful,  as  opposed  to  accurate,  answer  from 
an  estimate  are  complex.  They  characterise 
things:  the  system  being  built,  the  environ¬ 
ment  and  people  working  on  the  project, 
the  management  of  that  environment,  the 
tools  and  their  effectiveness,  the  level  of 
documentation,  and  many  other  factors. 
This  is  clearly  seen  in  the  operation  of 
many  parametric  estimation  tools  and 
processes.  For  instance,  the  COCOMO  II 
model  uses  Scale  Factors  which  determine 
the  size  exponent.  These  include  the  fol¬ 
lowing: 

•  System / proj  ect  attributes :  How  new  the 
project  is. 

•  Project/process  attributes:  Develop¬ 
ment  process  flexibiHty  and  architecture 
risk  resolution. 

•  Team  attributes:  Team  cohesion. 

•  Organizational  attributes:  Process  matu- 
rity  [5], 

COCOMO  II  further  expands  on  its 
characterization  with  a  set  of  cost  driver  fac¬ 
tors  which  reflect  everything  from  the  doc¬ 
umentation  level  to  the  complexif  of  the  sys¬ 
tem  being  built. 

Given  a  reasonably  sound  characteriza¬ 
tion  of  an  environment  and  system  using 
these  types  of  factors  -  a  very  significant 
task  in  itself  -  an  estimation  process 
becomes  an  analogue  of  the  projects  being 
run  in  that  environment. 

A  Certain  Uncertainty 

Even  if  we  think  we  are  able  to  accurately  size 
a  projected  system,  and  we  have  good  data 
which  we  think  characterizes  the  team,  sys- 


Figure  3:  High  Order  Reciprocal  Power 
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tern,  and  environment,  we  invariably  find 
that  these  are  not  certain.  With  each  factor, 
there  comes  some  degree  of  variability:  We 
may  have  been  this  productive  in  the  past, 
but  our  productivity  may  be  different  now. 
The  project  might  be  this  big,  but  it  could  be 
that  big.  It  could  be  on  the  simple  side  of 
complex,  or  the  complex  side  of  simple. 
The  new  tools  or  language  we  are  planning 
to  use  might  help  a  lot,  a  little,  or  they  might 
require  more  effort  than  they  save.  The  only 
way  to  be  truly  certain  is  to  try  it. 

For  each  of  the  factors,  we  can  postu¬ 
late  that  they  will  Hkely  operate  over  a  range 
of  values.  The  product  of  all  these  vari¬ 
ances  determines  the  aggregate  uncertainty 
of  the  project  and  defines  the  slope  of  the 
cumulative  probability  S-curve  in  Figure  2. 
There  are  many  challenges  to  calculating 
these  ranges,  not  the  least  being  that  the  fac¬ 
tors  are  not  independent.  Processing  the 
individual  variances  in  a  statistically  legiti¬ 
mate  way  allows  us  to  calculate  the  total 
uncertainty  in  the  final  project  solution(s). 

Estimation  as  Simulation 

Here  we  get  to  the  real  purpose  of  estima¬ 
tion.  If  we  have  reasonably  characterized 
the  environment,  if  we  have  established 
some  operational  variance  for  the  size  of 
the  system  and  its  complexity,  if  we  have 
some  idea  of  the  ranges  of  difficulty  in 
obtaining  the  knowledge  for  the  system, 
and  if  we  have  some  calibrated  way  of  pro¬ 
cessing  this  information,  we  could  simulate 
what  might  happen  when  we  run  the  pro¬ 
ject. 

The  concept  of  a  statistical  approach  to 
the  management  of  software  (especially 
under  the  umbrella  of  Six  Sigma)  has  its 
detractors  and  they  have  some  very  good 
points  [6].  Developing  software  is  not  the 
repetitive  cranking  out  of  identical  units. 
Indeed,  doing  something  differently  from 
the  last  time  is  a  bad  thing  from  a  manufac¬ 
turing  perspective,  and  most  of  the  effort  in 
statistical  process  control  is  dedicated  to 
identifying,  analyzing,  and  removing  vari¬ 
ance.  But  in  software,  variance  is  the  reason 
why  we  have  a  project  at  all  —  if  we  wanted 
to  do  it  just  like  the  last  time,  we  would  sim¬ 
ply  use  whatever  we  produced  last  time. 
Again,  software  is  not  a  product  at  all,  it  is  a 
medium  in  which  we  express,  store,  and 
make  active  the  knowledge  that  we  gain 
when  we  run  a  project.  It  is  the  knowledge 
that  is  the  thing,  the  software  is  simply 
where  we  put  it. 

We  do  not  have  the  resources  to  run  the 
same  project  multiple  times  to  see  how  to 
run  it  best.  We  only  get  to  run  a  project 
once.  However,  we  could  set  up  an  estima¬ 
tion  system,  with  certified  and  controlled 
inputs,  that  reasonably  collects  the  Hkely 


variances  in  the  key  characterizing  factors  of 
product,  personnel,  technology,  and  envi¬ 
ronment,  and  we  could  model  the  interac¬ 
tion  of  these  variables  reasonably  weH. 
Doing  this,  we  can  simulate  the  behavior  of 
our  organization  when  it  runs  the  project 
under  a  given  set  of  conditions.  These 
results  can  only  be  expressed  in  probabiHs- 
tic  terms  —  since  we  have  uncertain  inputs, 
we  must  have  uncertain  outputs.  These  out¬ 
puts  look  a  lot  Hke  statistical  variance  analy¬ 
ses,  even  though  we  only  have  one  project. 
Many  companies  have  and  use  various  esti¬ 
mation  processes  and  tools,  but  few  have 
estabHshed  estimation  systems  that  control, 
audit,  and  report  project  estimation  and  risk 
data  in  the  same  way  we  control,  audit,  and 
report  on  accounting  data. 

The  financial  management  sections  in 
most  companies  have  financial  models  that 
they  create,  manage,  and  use.  These  models 
incorporate  key  factors  in  the  financial  mar¬ 
kets:  inflation,  growth  in  Gross  National 
Product,  cost  of  capital,  market  sensitivity, 
etc.  Companies  find  these  tools  very  valu¬ 
able  in  helping  to  understand  what  kinds  of 
decisions  might  be  more  optimal  than  oth¬ 
ers.  Do  these  tools  predict  the  future?  No. 
They  cannot  do  that.  But  they  can  and  do 
help  in  the  financial  management  of  com¬ 
panies;  they  are  very  valuable  tools  and  sys¬ 
tems.  We  could  do  the  same  thing  for  soft¬ 
ware  projects  and  estimation. 

Comedian  Woody  Alien  once  remarked 
“the  only  thing  I  cannot  accurately  predict 
is  the  future. . What  an  estimation  simu¬ 
lator  could  do  is  help  identify  more  (or  less) 
optimal  decisions  about  how  we  run  our 
projects,  before  we  actuaUy  run  them.  We 
often  teach  pilots  on  simulators.  They  do 
not  replace  learning  on  the  real  thing,  but 
you  can  try  things  and  test  out  behaviors  on 
a  simulator  that  you  would  not  want  to  try 
on  the  real  thing. 

Fifteen  Too  Many,  One  Too  Few 

Several  months  ago,  a  cHent  of  mine  was 
considering  implementing  a  large  project  in 
15  equal  increments  spread  over  three  years. 
Using  estimation  tools  to  model  the  whole 
system  in  a  one-release,  three-year  deHvery, 
big  hang  approach  we  showed  that  this  was 
not  a  highly  constrained  system.  However, 
modeHng  the  15  increments  in  our  estima¬ 
tion  system  showed  that  the  sum  of  the 
parts  was  a  lot  bigger  than  the  whole.  We 
could  demonstrate  that  the  overlapping 
increments  inserted  a  very  high  degree  of 
risk  into  the  project  that  would  only 
become  evident  some  way  down  the  Hne. 
This  project  would  look  pretty  good  for 
about  two  years  and  then  the  wheel  would 
faU  off  The  big  bang  approach  was  much 
less  risky,  but  the  customer  would  see  no 


value  for  three  years.  We  modeled  many 
possible  solutions  including  a  four,  unequal- 
increment  solution  that  we  were  able  to 
demonstrate  would  deHver  the  most  func- 
tionaHty  to  the  customer  at  the  earHest  date, 
with  the  lowest  risk. 

We  could  have  learned  that  the  1 5  incre¬ 
ment  solution  was  a  bad  idea  by  trying  it, 
thereby  costing  the  company  several  milHon 
doHars.  Or,  we  could  learn  the  same  lesson 
by  simulating  what  would  happen  and  pre¬ 
emptively  picking  a  more  reasonable  course. 

There  is  Httle  doubt  we  need  to  improve 
our  performance  in  software  project  esti¬ 
mation.  The  stakes  are  very  high,  but  if  we 
aHgn  our  expectations  of  estimation  in 
accordance  with  the  reaHty  of  software 
development  and  set  up  our  organizations 
to  feed  a  software  development  business 
simulation  system,  the  rewards  wiH  be  even 
higher.  We  can  do  that.^ 

References 

1.  Wood,  Michael.  The  Road  to  Delphi: 
Scenes  From  the  History  of  Oracles 

Farrar.  New  York:  Straus  and  Giroux, 
2003. 

2.  Armour,  P.G  “The  Case  for  a  New 
Business  Model.”  Communications  of 
the  ACM  43.8  (2000):  19-22. 

3.  Armour,  P.G.  “The  Laws  of  Software 
Process.”  Boca  Raton,  FL:  Auerbach 
PubHshers,  2003. 

4.  Putnam,  Lawrence  H.,  and  Ware  Myers. 
Measures  for  Excellence.  Englewood 
CHffs,  NJ:  Yourdon  Press/Prentice  HaU, 
1992. 

5.  Boehm,  Barry,  W,  et  al.  “Software  Cost 
Estimation  with  COCOMO  11.”  Upper 
Saddle  River,  NJ:  Prentice  HaU,  2000. 

6.  Binder,  Robert  V.  “Can  a  Manufacturing 
QuaUty  Model  Work  for  Software?” 
IEEE  Software  14.5  (1997):  101-105. 

About  the  Author 

Phillip  G.  Armour  is  a 

senior  consultant  for 
Corvus  International, 
Inc.  He  is  a  contributing 
editor  at  Communications  of 
the  ACM  and  authored 
the  book  “The  Laws  of  Software 
Process.” 

Corvus  International,  Inc. 

205  Briargate  LN 
Deer  Park,  IL  60010 
Phone:  (847)  438-1609 
E-mail:  armour@corvusintl.com 


April  2008 


www.stsc.hill.af.mil  29 


Departments 


fystems  &  Software 
Technology  Conference 


29  April  -  2  May  2008  •  LAS  VEGAS,  NEVADA 

Technology:  Tipping  the  Baiance 


Welcome  to  the  20th  installment  of  the  Systems  and 
Software  Technology  Conference  (SSTC).  Just  about 
20  years  ago,  Tetris  was  becoming  the  computer  game 
of  choice.  VGA  Graphics  were  gaining  in  popularity.  A 
company  known  by  its  initials,  IBM,  was  delivering  the 
PS/2  computer.  It  had  a  funny  new  pointing  device 
mouse.  And  the  first  ever  Software  TechnoloayJBtJnference 
was  held.  From  the  beginning,  SSTC  hasjkfcused  on 
technology  relating  to  the  DepartmrtSf^f  Defense.  We 
begin  another  decade  with  this  s^me  focus. 

Our  theme  will  be  ”Techn<^6gy:  Tipping  the  Balance" 
The  idea  behind  the  theme  is  to  explore  new  and  needed 
technologies,  as  well  as  lessons  learned,  which  tip  the 
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Science  Fair,  Farce,  and  Free-For-All 


I  started  this  article  on  January  21st,  the  most  depressing 
day  of  the  year  [1].  Dr.  Cliff  Arnall  gauged  the  third 
Monday  of  January  to  be  the  most  depressing  day  using  a 
formula  based  on  weather,  holiday  debt,  and  failed  resolu¬ 
tions. 

The  exception  to  ArnalFs  formula  involves  engineers  with 
children  in  kindergarten  through  eighth  grade.  The  brightly 
colored  notice  they  receive  each  year  for  the  school  science 
fair  offers  a  respite  from  depression.  The  prospect  to  tinker 
with  science  offsets  the  depressing  effects  of  weather,  debt, 
and  failed  resolutions. 

Let’s  be  honest:  Most  engi¬ 
neers  engage  in  little  or  no 
engineering.  They  entered 
engineering  to  design  and 
build  but  the  dirty  little  secret 
they  don’t  tell  you  in  college  is 
that  only  a  small  percentage  of 
engineers  actually  design  or 
build.  Most  engineers  docu¬ 
ment,  configure,  test,  meet, 
review,  manage,  meet,  inspect, 
and  meet  again,  but  few 
design.  Those  who  do  design 
rarely  get  hands-on  building 
projects  and  hands-on  soft¬ 
ware  is  non-existent  (note: 
software  builds  and  keyboard 
strokes  do  not  count). 

When  that  science  fair 
paper  hits  home,  the  pent-up 
frustration  of  deprived  engi¬ 
neers  uncorks  like  a  potato  out 
of  a  butane  fueled  polyvinyl 
chloride  pipe  —  another  release 
activity  for  hamstrung  engi¬ 
neers.  Wheels  start  turning, 
the  engineering  paper  comes 
out,  and  the  Home  Depot 
account  mounts. 

Now,  I’m  a  big  fan  of 
parental  involvement  in  school 
activities,  and  I  also  support 
alleviation  of  engineering 
frustration,  but  I  must  caution 
my  fellow  engineers:  Do  not 
overdo  it.  Here  are  signs  your 
child  may  not  be  getting  the 
expected  science  project  expe¬ 
rience: 

•  The  project  takes  more  than  20  minutes  to  set  up. 

•  Armed  guards  are  required  to  protect  the  project. 

•  You  ask  the  janitor  for  a  high  voltage  outlet. 

•  Lloyds  of  London  insures  the  project. 

•  You  have  to  return  the  derrick  crane  by  4:00  p.m. 

•  Occupational  Safety  and  Health  Administration  inspec¬ 
tion  is  required. 

•  Your  child  answers  all  questions  with,  “Mom!” 

Keep  your  project  simple,  involve  your  child,  and,  above 


all,  please  leave  the  volcanoes,  Mentos  geysers,  cake  baking 
instructions,  and  soda  pop-soaked  teeth  at  home. 

My  daughter,  Hannah,  was  inspired  by  MythBusters  to 
determine  the  fastest  way  to  cool  a  can  of  soda  pop  [2].  The 
results,  in  the  Cool  It  Pop  graph,  determined  that  a  cooler 
full  of  ice  and  salt  water  is  your  best  bet  —  provided  you 
don’t  have  a  fire  extinguisher  on  hand. 

Hannah  noticed  the  refrigerator  and  freezer  seemed  very 
slow  to  cool.  I  noticed  Hannah  left  the  door  open  during 
measures,  diminishing  the  refrigerator’s  cooling  ability.  She 

modified  her  measurement 
by  pulling  the  can  out  of  the 
refrigerator,  closing  the  door 
during  measurement,  and 
then  returning  it.  The  results 
were  much  better  as  dis¬ 
played  in  the  Keep  the  Door 
Closed  graph. 

What  can  the  science  fair 
teach  us  about  tracking  engi¬ 
neering  projects?  First  and 
foremost,  if  you  manage 
engineers,  give  them  at  least 
one  task  requiring  designing, 
tinkering,  or  building.  This 
act  alone  will  save  both  yours 
and  the  engineer’s  sanity. 

Second,  resist  the  temptation 
to  over-measure.  If  measure¬ 
ments  become  more  important 
than  the  project  itself,  your 
measures  will  be  sullied.  Keep 
the  door  closed  and  let  your 
engineers  engineer. 

Third,  unbiased  and  pre¬ 
cise  measures  are  impossible. 
Factor  that  into  your  analy¬ 
sis,  be  tolerant  on  precision, 
and  vigilant  on  accuracy  (see 
[3]  to  discern  the  difference). 

Finally,  waste  not,  want  not 
—  keep  it  simple.  Not  simple 
minded;  simple  to  imple¬ 
ment,  simple  to  measure, 
simple  to  use,  and  simply 
effective. 


— Gary  A.  Petersen 

A^rrowpoint  Solutions,  Inc. 
gpetersen@arrowpoint.us 
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